RHCSA and RHCE Cert Guide and Lab Manual
RHCSA and RHCE Cert Guide and Lab Manual
RHCSA and RHCE Cert Guide and Lab Manual
20 for
Linux
Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial
Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under
vendors standard commercial license.
The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express
warranty statements accompanying such products and services. Nothing herein must be construed as constituting an additional warranty. HP shall
not be liable for technical or editorial errors or omissions contained herein.
Preface......................................................................................................17
Contents 3
3.1.2.1 What is WBEM?.................................................................................................37
3.1.2.2 Support for Serviceguard WBEM Provider..............................................................37
3.1.2.3 WBEM Query....................................................................................................37
3.1.2.4 WBEM Indications..............................................................................................38
3.2 How the Cluster Manager Works .......................................................................................38
3.2.1 Configuration of the Cluster ........................................................................................38
3.2.2 Heartbeat Messages .................................................................................................39
3.2.3 Manual Startup of Entire Cluster..................................................................................39
3.2.4 Automatic Cluster Startup ..........................................................................................39
3.2.5 Dynamic Cluster Re-formation .....................................................................................40
3.2.6 Cluster Quorum to Prevent Split-Brain Syndrome............................................................40
3.2.7 Cluster Lock..............................................................................................................40
3.2.8 Use of a Lock LUN as the Cluster Lock..........................................................................40
3.2.9 Use of the Quorum Server as a Cluster Lock..................................................................41
3.2.10 No Cluster Lock ......................................................................................................42
3.2.11 What Happens when You Change the Quorum Configuration Online.............................43
3.3 How the Package Manager Works......................................................................................43
3.3.1 Package Types...........................................................................................................43
3.3.1.1 Non-failover Packages.........................................................................................43
3.3.1.2 Failover Packages...............................................................................................44
3.3.1.2.1 Configuring Failover Packages ......................................................................44
3.3.1.2.2 Deciding When and Where to Run and Halt Failover Packages ........................44
3.3.1.2.3 Failover Packages Switching Behavior...........................................................45
3.3.1.2.4 Failover Policy.............................................................................................47
3.3.1.2.5 Automatic Rotating Standby..........................................................................47
3.3.1.2.6 Failback Policy............................................................................................50
3.3.1.2.7 On Combining Failover and Failback Policies.................................................53
3.3.2 Using the Generic Resources Monitoring Service...........................................................53
3.3.3 Using Older Package Configuration Files......................................................................54
3.4 How Packages Run...........................................................................................................54
3.4.1 What Makes a Package Run?......................................................................................55
3.4.2 Before the Control Script Starts....................................................................................56
3.4.3 During Run Script Execution........................................................................................57
3.4.4 Normal and Abnormal Exits from the Run Script............................................................58
3.4.5 Service Startup with cmrunserv....................................................................................58
3.4.6 While Services are Running........................................................................................58
3.4.7 When a Service or Subnet Fails or Generic Resource or a Dependency is Not Met............59
3.4.8 When a Package is Halted with a Command................................................................59
3.4.9 During Halt Script Execution.......................................................................................59
3.4.10 Normal and Abnormal Exits from the Halt Script..........................................................60
3.4.10.1 Package Control Script Error and Exit Conditions...................................................61
3.5 How the Network Manager Works ....................................................................................62
3.5.1 Stationary and Relocatable IP Addresses and Monitored Subnets.....................................62
3.5.2 Types of IP Addresses................................................................................................63
3.5.3 Adding and Deleting Relocatable IP Addresses ............................................................63
3.5.3.1 Load Sharing ....................................................................................................63
3.5.4 Bonding of LAN Interfaces .........................................................................................63
3.5.5 Bonding for Load Balancing.......................................................................................66
3.5.6 Monitoring LAN Interfaces and Detecting Failure: Link Level............................................66
3.5.7 Monitoring LAN Interfaces and Detecting Failure: IP Level...............................................66
3.5.7.1 Reasons To Use IP Monitoring..............................................................................67
3.5.7.2 How the IP Monitor Works..................................................................................67
3.5.7.2.1 Failure and Recovery Detection Times............................................................68
3.5.7.3 Constraints and Limitations..................................................................................69
3.5.8 Reporting Link-Level and IP-Level Failures.......................................................................69
4 Contents
3.5.9 Package Switching and Relocatable IP Addresses..........................................................69
3.5.10 Address Resolution Messages after Switching on the Same Subnet .................................70
3.5.11 VLAN Configurations................................................................................................70
3.5.11.1 What is VLAN?.................................................................................................70
3.5.11.2 Support for Linux VLAN......................................................................................70
3.5.11.3 Configuration Restrictions....................................................................................70
3.5.11.4 Additional Heartbeat Requirements......................................................................71
3.6 Volume Managers for Data Storage....................................................................................71
3.6.1 Storage on Arrays......................................................................................................71
3.6.2 Monitoring Disks.......................................................................................................72
3.6.3 More Information on LVM...........................................................................................72
3.7 About Persistent Reservations..............................................................................................72
3.7.1 Rules and Limitations..................................................................................................73
3.7.2 How Persistent Reservations Work................................................................................74
3.8 Responses to Failures ........................................................................................................75
3.8.1 Reboot When a Node Fails .......................................................................................75
3.8.1.1 What Happens when a Node Times Out...............................................................75
3.8.1.1.1 Example .....................................................................................................76
3.8.2 Responses to Hardware Failures .................................................................................76
3.8.3 Responses to Package and Service Failures ..................................................................77
3.8.4 Responses to Package and Generic Resources Failures...................................................77
3.8.4.1 Service Restarts .................................................................................................78
3.8.4.2 Network Communication Failure .........................................................................78
Contents 5
4.7.3.2 What Is IPv6-Only Mode?...................................................................................88
4.7.3.2.1 Rules and Restrictions for IPv6-Only Mode......................................................89
4.7.3.2.2 Recommendations for IPv6-Only Mode..........................................................90
4.7.3.3 What Is Mixed Mode?........................................................................................90
4.7.3.3.1 Rules and Restrictions for Mixed Mode...........................................................90
4.7.4 Cluster Configuration Parameters ................................................................................90
4.7.5 Cluster Configuration: Next Step ..............................................................................104
4.8 Package Configuration Planning ......................................................................................104
4.8.1 Logical Volume and File System Planning ...................................................................105
4.8.2 Planning for NFS-mounted File Systems......................................................................106
4.8.3 Planning for Expansion............................................................................................107
4.8.4 Choosing Switching and Failover Behavior.................................................................107
4.8.5 Parameters for Configuring Generic Resources............................................................108
4.8.6 Configuring a Generic Resource...............................................................................109
4.8.6.1 Getting and Setting the Status/Value of a Simple/Extended Generic Resource.........111
4.8.6.1.1 Using Serviceguard Command to Get the Status/Value of a Simple/Extended
Generic Resource...................................................................................................111
4.8.6.1.2 Using Serviceguard Command to Set the Status/Value of a Simple/Extended
Generic Resource...................................................................................................111
4.8.6.2 Online Reconfiguration of Generic Resources......................................................112
4.8.6.3 Online Reconfiguration of serviceguard-xdc Modular Package Parameters...............112
4.8.7 About Package Dependencies..................................................................................113
4.8.7.1 Simple Dependencies.......................................................................................113
4.8.7.2 Rules for Simple Dependencies..........................................................................113
4.8.7.2.1 Dragging Rules for Simple Dependencies.....................................................115
4.8.7.3 Guidelines for Simple Dependencies..................................................................117
4.8.7.4 Extended Dependencies...................................................................................117
4.8.7.4.1 Rules for Exclusionary Dependencies...........................................................118
4.8.7.4.2 Rules for different_node and any_node Dependencies...................................119
4.8.8 What Happens When a Package Fails......................................................................119
4.8.9 For More Information...............................................................................................120
4.8.10 About Package Weights..........................................................................................120
4.8.10.1 Package Weights and Node Capacities............................................................120
4.8.10.2 Configuring Weights and Capacities................................................................120
4.8.10.3 Simple Method..............................................................................................121
4.8.10.3.1 Example 1.............................................................................................121
4.8.10.3.2 Points to Keep in Mind............................................................................122
4.8.10.4 Comprehensive Method..................................................................................122
4.8.10.4.1 Defining Capacities.................................................................................122
4.8.10.4.2 Defining Weights...................................................................................124
4.8.10.5 Rules and Guidelines......................................................................................126
4.8.10.6 For More Information......................................................................................126
4.8.10.7 How Package Weights Interact with Package Priorities and Dependencies..............126
4.8.10.7.1 Example 1..............................................................................................127
4.8.10.7.2 Example 2.............................................................................................127
4.8.11 About External Scripts.............................................................................................127
4.8.11.1 Using Serviceguard Commands in an External Script............................................129
4.8.11.2 Determining Why a Package Has Shut Down......................................................130
4.8.11.2.1 last_halt_failed Flag..................................................................................130
4.8.12 About Cross-Subnet Failover....................................................................................130
4.8.12.1 Implications for Application Deployment.............................................................131
4.8.12.2 Configuring a Package to Fail Over across Subnets: Example...............................131
4.8.12.2.1 Configuring node_name...........................................................................131
4.8.12.2.2 Configuring monitored_subnet_access.......................................................132
4.8.12.2.3 Configuring ip_subnet_node.....................................................................132
6 Contents
4.8.13 Configuring a Package: Next Steps..........................................................................132
4.9 Planning for Changes in Cluster Size.................................................................................132
Contents 7
5.2.8.4 Setting up Access-Control Policies......................................................................160
5.2.8.4.1 Role Conflicts...........................................................................................162
5.2.8.5 Package versus Cluster Roles.............................................................................163
5.2.9 Verifying the Cluster Configuration ............................................................................163
5.2.10 Cluster Lock Configuration Messages........................................................................163
5.2.11 Distributing the Binary Configuration File ..................................................................164
5.3 Managing the Running Cluster.........................................................................................164
5.3.1 Checking Cluster Operation with Serviceguard Commands...........................................164
5.3.2 Setting up Autostart Features ....................................................................................165
5.3.3 Changing the System Message ................................................................................166
5.3.4 Managing a Single-Node Cluster..............................................................................166
5.3.4.1 Single-Node Operation....................................................................................166
5.3.5 Disabling identd......................................................................................................166
5.3.6 Deleting the Cluster Configuration ............................................................................167
5.4 Rebuilding the Deadman Driver........................................................................................167
8 Contents
6.1.4.31 service_halt_timeout........................................................................................184
6.1.4.32 generic_resource_name...................................................................................184
6.1.4.33 generic_resource_evaluation_type.....................................................................185
6.1.4.34 generic_resource_up_criteria............................................................................185
6.1.4.35 vgchange_cmd..............................................................................................186
6.1.4.36 vg................................................................................................................186
6.1.4.37 File system parameters.....................................................................................186
6.1.4.38 concurrent_fsck_operations..............................................................................187
6.1.4.39 fs_mount_retry_count.......................................................................................187
6.1.4.40 fs_umount_retry_count ....................................................................................187
6.1.4.41 fs_name........................................................................................................187
6.1.4.42 fs_server........................................................................................................188
6.1.4.43 fs_directory....................................................................................................188
6.1.4.44 fs_type..........................................................................................................188
6.1.4.45 fs_mount_opt.................................................................................................189
6.1.4.46 fs_umount_opt................................................................................................189
6.1.4.47 fs_fsck_opt.....................................................................................................189
6.1.4.48 pv................................................................................................................190
6.1.4.49 pev_.............................................................................................................190
6.1.4.50 external_pre_script..........................................................................................190
6.1.4.51 external_script................................................................................................190
6.1.4.52 user_host.......................................................................................................190
6.1.4.53 user_name.....................................................................................................191
6.1.4.54 user_role.......................................................................................................191
6.1.4.55 Additional Parameters Used Only by Legacy Packages........................................191
6.2 Generating the Package Configuration File.........................................................................191
6.2.1 Before You Start.......................................................................................................192
6.2.2 cmmakepkg Examples.............................................................................................192
6.2.3 Next Step...............................................................................................................193
6.3 Editing the Configuration File............................................................................................193
6.4 Adding or Removing a Module from an Existing Package.....................................................196
6.5 Verifying and Applying the Package Configuration..............................................................196
6.6 Alert Notification for Serviceguard Environment..................................................................197
6.7 Adding the Package to the Cluster....................................................................................198
6.8 Creating a Disk Monitor Configuration..............................................................................198
Contents 9
7.1.11.6 Status After Halting a Node...............................................................................206
7.1.11.7 Viewing Information about Unowned Packages.....................................................207
7.1.12 Checking the Cluster Configuration and Components...................................................207
7.1.12.1 Verifying Cluster and Package Components..........................................................208
7.1.12.2 Setting up Periodic Cluster Verification................................................................210
7.1.12.3 Limitations.......................................................................................................210
7.2 Managing the Cluster and Nodes ....................................................................................211
7.2.1 Starting the Cluster When all Nodes are Down............................................................211
7.2.2 Adding Previously Configured Nodes to a Running Cluster............................................212
7.2.3 Removing Nodes from Participation in a Running Cluster...............................................212
7.2.3.1 Using Serviceguard Commands to Remove a Node from Participation in a Running
Cluster ......................................................................................................................212
7.2.4 Halting the Entire Cluster .........................................................................................213
7.2.5 Automatically Restarting the Cluster ...........................................................................213
7.3 Halting a Node or the Cluster while Keeping Packages Running............................................213
7.3.1 What You Can Do...................................................................................................213
7.3.2 Rules and Restrictions...............................................................................................214
7.3.3 Additional Points To Note.........................................................................................215
7.3.4 Halting a Node and Detaching its Packages...............................................................216
7.3.5 Halting a Detached Package.....................................................................................216
7.3.6 Halting the Cluster and Detaching its Packages............................................................216
7.3.7 Example: Halting the Cluster for Maintenance on the Heartbeat Subnets.........................217
7.4 Managing Packages and Services ....................................................................................217
7.4.1 Starting a Package ..................................................................................................217
7.4.1.1 Starting a Package that Has Dependencies...........................................................218
7.4.2 Halting a Package ..................................................................................................218
7.4.2.1 Halting a Package that Has Dependencies...........................................................218
7.4.2.2 Handling Failures During Package Halt...............................................................218
7.4.3 Moving a Failover Package ......................................................................................219
7.4.4 Changing Package Switching Behavior ......................................................................220
7.5 Maintaining a Package: Maintenance Mode......................................................................220
7.5.1 Characteristics of a Package Running in Maintenance Mode or Partial-Startup Maintenance
Mode ............................................................................................................................221
7.5.1.1 Rules for a Package in Maintenance Mode or Partial-Startup Maintenance Mode ......222
7.5.1.1.1 Additional Rules for Partial-Startup Maintenance Mode....................................222
7.5.1.2 Dependency Rules for a Package in Maintenance Mode or Partial-Startup Maintenance
Mode .......................................................................................................................223
7.5.2 Performing Maintenance Using Maintenance Mode.....................................................223
7.5.2.1 Procedure........................................................................................................223
7.5.3 Performing Maintenance Using Partial-Startup Maintenance Mode.................................224
7.5.3.1 Procedure........................................................................................................224
7.5.3.2 Excluding Modules in Partial-Startup Maintenance Mode.......................................224
7.6 Reconfiguring a Cluster....................................................................................................225
7.6.1 Previewing the Effect of Cluster Changes.....................................................................226
7.6.1.1 What You Can Preview......................................................................................226
7.6.1.2 Using Preview mode for Commands in Serviceguard Manager...............................226
7.6.1.3 Using cmeval...................................................................................................227
7.6.2 Reconfiguring a Halted Cluster .................................................................................228
7.6.3 Reconfiguring a Running Cluster................................................................................228
7.6.3.1 Adding Nodes to the Configuration While the Cluster is Running ...........................228
7.6.3.2 Removing Nodes from the Cluster while the Cluster Is Running ..............................229
7.6.4 Changing the Cluster Networking Configuration while the Cluster Is Running...................230
7.6.4.1 What You Can Do............................................................................................230
7.6.4.2 What You Must Keep in Mind...........................................................................230
7.6.4.3 Example: Adding a Heartbeat LAN....................................................................231
10 Contents
7.6.4.4 Example: Deleting a Subnet Used by a Package..................................................232
7.6.5 Updating the Cluster Lock LUN Configuration Online....................................................233
7.6.6 Changing MAX_CONFIGURED_PACKAGES...............................................................233
7.7 Configuring a Legacy Package..........................................................................................233
7.7.1 Creating the Legacy Package Configuration ................................................................233
7.7.1.1 Using Serviceguard Manager to Configure a Package ...........................................234
7.7.1.2 Using Serviceguard Commands to Configure a Package ........................................234
7.7.1.2.1 Configuring a Package in Stages..................................................................234
7.7.1.2.2 Editing the Package Configuration File..........................................................234
7.7.2 Creating the Package Control Script...........................................................................236
7.7.2.1 Customizing the Package Control Script ..............................................................236
7.7.2.2 Adding Customer Defined Functions to the Package Control Script .........................237
7.7.2.2.1 Adding Serviceguard Commands in Customer Defined Functions ....................237
7.7.2.3 Support for Additional Products..........................................................................237
7.7.3 Verifying the Package Configuration...........................................................................238
7.7.4 Distributing the Configuration....................................................................................238
7.7.4.1 Distributing the Configuration And Control Script with Serviceguard Manager..........238
7.7.4.2 Copying Package Control Scripts with Linux commands.........................................238
7.7.4.3 Distributing the Binary Cluster Configuration File with Linux Commands ..................239
7.7.5 Configuring Cross-Subnet Failover..............................................................................239
7.7.5.1 Configuring node_name....................................................................................239
7.7.5.2 Configuring monitored_subnet_access.................................................................239
7.7.5.3 Creating Subnet-Specific Package Control Scripts..................................................240
7.7.5.3.1 Control-script entries for nodeA and nodeB...................................................240
7.7.5.3.2 Control-script entries for nodeC and nodeD..................................................240
7.8 Reconfiguring a Package..................................................................................................240
7.8.1 Migrating a Legacy Package to a Modular Package.....................................................240
7.8.2 Reconfiguring a Package on a Running Cluster ...........................................................240
7.8.3 Renaming or Replacing an External Script Used by a Running Package..........................241
7.8.4 Reconfiguring a Package on a Halted Cluster .............................................................241
7.8.5 Adding a Package to a Running Cluster......................................................................242
7.8.6 Deleting a Package from a Running Cluster ................................................................242
7.8.7 Resetting the Service Restart Counter..........................................................................242
7.8.8 Allowable Package States During Reconfiguration .......................................................242
7.8.8.1 Changes that Will Trigger Warnings...................................................................247
7.8.9 Online Reconfiguration of Modular package...............................................................247
7.8.9.1 Handling Failures During Online Package Reconfiguration.....................................248
7.9 Responding to Cluster Events ...........................................................................................253
7.10 Single-Node Operation .................................................................................................253
7.11 Removing Serviceguard from a System..............................................................................254
Contents 11
8.7.1 Reviewing Package IP Addresses ...............................................................................260
8.7.2 Reviewing the System Log File ..................................................................................261
8.7.2.1 Sample System Log Entries ................................................................................261
8.7.3 Reviewing Configuration Files ...................................................................................262
8.7.4 Reviewing the Package Control Script ........................................................................262
8.7.5 Using the cmquerycl and cmcheckconf Commands......................................................262
8.7.6 Reviewing the LAN Configuration .............................................................................263
8.8 Solving Problems ...........................................................................................................263
8.8.1 Name Resolution Problems.......................................................................................263
8.8.1.1 Networking and Security Configuration Errors......................................................263
8.8.2 Halting a Detached Package....................................................................................263
8.8.3 Cluster Re-formations Caused by Temporary Conditions...............................................264
8.8.4 Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low........................264
8.8.5 System Administration Errors ....................................................................................265
8.8.5.1 Package Control Script Hangs or Failures ...........................................................265
8.8.6 Package Movement Errors (Legacy Packages)..............................................................266
8.8.7 Node and Network Failures ....................................................................................267
8.8.8 Troubleshooting the Quorum Server...........................................................................267
8.8.8.1 Authorization File Problems...............................................................................267
8.8.8.2 Timeout Problems............................................................................................267
8.8.8.3 Messages.......................................................................................................268
8.8.9 Lock LUN Messages................................................................................................268
8.9 Troubleshooting serviceguard-xdc package........................................................................268
8.10 Troubleshooting Serviceguard Manager...........................................................................269
12 Contents
A.3.9 Avoid File Locking ..................................................................................................278
A.4 Restoring Client Connections ...........................................................................................278
A.5 Handling Application Failures .........................................................................................279
A.5.1 Create Applications to be Failure Tolerant ..................................................................279
A.5.2 Be Able to Monitor Applications ..............................................................................280
A.6 Minimizing Planned Downtime ........................................................................................280
A.6.1 Reducing Time Needed for Application Upgrades and Patches .....................................280
A.6.1.1 Provide for Rolling Upgrades .............................................................................280
A.6.1.2 Do Not Change the Data Layout Between Releases ..............................................281
A.6.2 Providing Online Application Reconfiguration ............................................................281
A.6.3 Documenting Maintenance Operations .....................................................................281
Contents 13
E.2.2 Scenario 2- Multi-Cluster Management.......................................................................298
Index.......................................................................................................311
14 Contents
Printing History
Table 1
Printing Date Part Number Edition
The last printing date and part number indicate the current edition, which applies to the A.11.20.20
version of HP Serviceguard for Linux.
The printing date changes when a new edition is printed. (Minor corrections and updates which
are incorporated at reprint do not cause the date to change.) The part number is revised when
extensive technical changes are incorporated.
New editions of this manual will incorporate all material updated since the previous edition.
15
16
Preface
This guide describes how to configure and manage Serviceguard for Linux on HP ProLiant server
under the Linux operating system. It is intended for experienced Linux system administrators. (For
Linux system administration tasks that are not specific to Serviceguard, use the system administration
documentation and manpages for your distribution of Linux.)
The contents are as follows:
Chapter 1 (page 19) describes a Serviceguard cluster and provides a roadmap for using this
guide.
Chapter 2 (page 25) provides a general view of the hardware configurations used by
Serviceguard.
Chapter 3 (page 33) describes the software components of Serviceguard and shows how
they function within the Linux operating system.
Chapter 4 (page 79) steps through the planning process.
Chapter 5 (page 135) describes the creation of the cluster configuration.
Chapter 6 (page 169) describes the creation of high availability packages.
Chapter 7 (page 199) presents the basic cluster administration tasks.
Chapter 8 (page 255) explains cluster testing and troubleshooting strategies.
Appendix A (page 271) gives guidelines for creating cluster-aware applications that provide
optimal performance in a Serviceguard environment.
Appendix B (page 283) provides suggestions for integrating your existing applications with
Serviceguard for Linux.
Appendix C (page 287) contains a set of empty worksheets for preparing a Serviceguard
configuration.
Appendix D (page 291) provides information about IPv6.
Appendix E (page 297) is an introduction to Serviceguard Manager.
Appendix F (page 301) provides a reference to the supported ranges for Serviceguard
parameters.
Appendix G (page 303) provides the monitoring script template for Generic Resources.
Appendix H (page 309) describes a group of tools to simplify the integration of popular
applications with Serviceguard.
Related Publications
For additional information, see the following documents at http://www.hp.com/go/
linux-serviceguard-docs:
HP Serviceguard A.11.20.20 for Linux Release Notes
HP Serviceguard Quorum Server Version A.04.00 Release Notes
HP Serviceguard Extended Distance Cluster for Linux A.11.20.20 Deployment Guide
HP Serviceguard for Linux Version A.11.20 Deployment Guide
Clusters for High Availability: a Primer of HP Solutions. Second Edition. HP Press, 2001
(ISBN 0-13-089355-2)
17
Information about supported configurations is in the HP Serviceguard for Linux Configuration Guide.
For updated information on supported hardware and Linux distributions refer to the HP Serviceguard
for Linux Certification Matrix. Both documents are available at:
http://www.hp.com/info/sglx
Problem Reporting
If you have any problems with the software or documentation, please contact your local
Hewlett-Packard Sales Office or Customer Service Center.
18
1 Serviceguard for Linux at a Glance
This chapter introduces Serviceguard for Linux and shows where to find different kinds of information
in this book. It includes the following topics:
What is Serviceguard for Linux? (page 19)
Using Serviceguard for Configuring in an Extended Distance Cluster Environment (page 21)
Using Serviceguard Manager (page 22)
Configuration Roadmap (page 22)
If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4 (page 79).
Specific steps for setup are in Chapter 5 (page 135).
In the figure, node 1 (one of two SPU's) is running package A, and node 2 is running package B.
Each package has a separate group of disks associated with it, containing data needed by the
package's applications, and a copy of the data. Note that both nodes are physically connected
to disk arrays. However, only one node at a time may access the data for a given group of disks.
In the figure, node 1 is shown with exclusive access to the top two disks (solid line), and node 2
is shown as connected without access to the top disks (dotted line). Similarly, node 2 is shown with
exclusive access to the bottom two disks (solid line), and node 1 is shown as connected without
access to the bottom disks (dotted line).
Disk arrays provide redundancy in case of disk failures. In addition, a total of four data buses are
shown for the disks that are connected to node 1 and node 2. This configuration provides the
maximum redundancy and also gives optimal I/O performance, since each package is using
different buses.
Note that the network hardware is cabled to provide redundant LAN interfaces on each node.
Serviceguard uses TCP/IP network services for reliable communication among nodes in the cluster,
including the transmission of heartbeat messages, signals from each functioning node which are
central to the operation of the cluster. TCP/IP services also are used for other types of inter-node
communication. (See, Understanding Serviceguard Software Components (page 33) for more
information about heartbeat.)
1.1.1 Failover
Under normal conditions, a fully operating Serviceguard cluster simply monitors the health of the
cluster's components while the packages are running on individual nodes. Any host system running
in the Serviceguard cluster is called an active node. When you create the package, you specify
a primary node and one or more adoptive nodes.When a node or its network communications
fails, Serviceguard can transfer control of the package to the next available adoptive node. This
situation is shown in Figure 2 (page 21).
After this transfer, the package typically remains on the adoptive node as long the adoptive node
continues running. If you wish, however, you can configure the package to return to its primary
node as soon as the primary node comes back online. Alternatively, you may manually transfer
control of the package back to the primary node at the appropriate time.
Figure 2 (page 21) does not show the power connections to the cluster, but these are important
as well. In order to remove all single points of failure from the cluster, you should provide as many
separate power circuits as needed to prevent a single point of failure of your nodes, disks and
disk mirrors. Each power circuit should be protected by an uninterruptible power source. For more
details, see Power Supply Planning (page 84) section.
Serviceguard is designed to work in conjunction with other high availability products, such as disk
arrays, which use various RAID levels for data protection; and HP-supported uninterruptible power
supplies (UPS), which eliminate failures related to power outage. HP recommends these products;
in conjunction with Serviceguard they provide the highest degree of availability.
HP recommends that you gather all the data that is needed for configuration before you start. See
Chapter 4 (page 79) for tips on gathering data.
CAUTION: If you configure any address other than a stationary IP address on a Serviceguard
network interface, it could collide with a relocatable package IP address assigned by
Serviceguard. See Stationary and Relocatable IP Addresses and Monitored Subnets
(page 62).
Similarly, Serviceguard does not support using networking tools to move or reconfigure
any IP addresses configured into the cluster.
Doing so leads to unpredictable results because the Serviceguard view of the configuration
is different from the reality.
NOTE: If you will be using a cross-subnet configuration, see also the Restrictions (page 28) that
apply specifically to such configurations.
In Linux configurations, the use of symmetrical LAN configurations is strongly recommended, with
the use of redundant hubs or switches to connect Ethernet segments. The software bonding
configuration should be identical on each node, with the active interfaces connected to the same
hub or switch.
2.2.3.2 Restrictions
The following restrictions apply:
All nodes in the cluster must belong to the same network domain (that is, the domain portion
of the fully-qualified domain name must be the same.)
The nodes must be fully connected at the IP level.
A minimum of two heartbeat paths must be configured for each cluster node.
There must be less than 200 milliseconds of latency in the heartbeat network.
Each heartbeat subnet on each node must be physically routed separately to the heartbeat
subnet on another node; that is, each heartbeat path must be physically separate:
The heartbeats must be statically routed; static route entries must be configured on each
node to route the heartbeats through different paths.
Failure of a single router must not affect both heartbeats at the same time.
IPv6 heartbeat subnets are not supported in a cross-subnet configuration.
IPv6only and mixed modes are not supported in a cross-subnet configuration. For more
information about these modes, see About Hostname Address Families: IPv4-Only, IPv6-Only,
and Mixed Mode (page 88).
Deploying applications in this environment requires careful consideration; see Implications
for Application Deployment (page 131).
cmrunnode will fail if the hostname LAN is down on the node in question. (Hostname
LAN refers to the public LAN on which the IP address that the nodes hostname resolves to
is configured.)
If a monitored_subnet is configured for PARTIAL monitored_subnet_access in a
packages configuration file, it must be configured on at least one of the nodes on the
node_name list for that package. Conversely, if all of the subnets that are being monitored
for this package are configured for PARTIAL access, each node on the node_name list must
have at least one of these subnets configured.
As in other configurations, a package will not start on a node unless the subnets configured
on that node, and specified in the package configuration file as monitored subnets, are
up.
NOTE: See also the Rules and Restrictions (page 26) that apply to all cluster networking
configurations.
NOTE: As of release A.11.16.07, Serviceguard for Linux provides functionality similar to HP-UX
exclusive activation. This feature is based on LVM2 hosttags, and is available only for Linux
distributions that officially support LVM2.
All of the disks in the volume group owned by a package must be connected to the original node
and to all possible adoptive nodes for that package.
Shared disk storage in Serviceguard Linux clusters is provided by disk arrays, which have redundant
power and the capability for connections to multiple nodes. Disk arrays use RAID modes to provide
redundancy.
Configuring multiple paths from different networks to the iSCSI LUN is not supported.
The iSCSI storage configured over LAN is similar to other LANs that are part of the cluster.
NOTE: The file cmcluster.conf contains the mappings that resolve symbolic references to
$SGCONF, $SGROOT, $SGLBIN, etc, used in the pathnames in the subsections that follow. See
Understanding the Location of Serviceguard Files (page 135) for details.
NOTE: An iSCSI storage device does not support configuring a lock LUN.
NOTE: WBEM queries for the previous classes on SUSE Linux Enterprise Server might fail because
of access denied issues, if Serviceguard is not able to validate the credentials of the WBEM request.
Small Footprint CIM Broker (SFCB) which is the CIM server in SUSE Linux Enterprise Server 11 SP1
and SP2 has a configuration parameter doBasicAuth which enables basic authentication for
HTTP and HTTPS connections. This parameter must be set to true in the /etc/sfcb/sfcb.cfg
file. Otherwise, the user credentials of any WBEM request is not passed to Serviceguard WBEM
Provider.
IMPORTANT: When multiple heartbeats are configured, heartbeats are sent in parallel;
Serviceguard must receive at least one heartbeat to establish the health of a node. HP recommends
that you configure all subnets that interconnect cluster nodes as heartbeat networks; this increases
protection against multiple faults at no additional cost.
Heartbeat IP addresses must be on the same subnet on each node, but it is possible to configure
a cluster that spans subnets; see Cross-Subnet Configurations (page 27). See HEARTBEAT_IP,
under Cluster Configuration Parameters (page 90), for more information about heartbeat
requirements. For timeout requirements and recommendations, see the MEMBER_TIMEOUT parameter
description in the same section. For troubleshooting information, see Cluster Re-formations Caused
by MEMBER_TIMEOUT Being Set too Low (page 264). See also Cluster Daemon: cmcld (page 34).
NOTE:
The lock LUN is dedicated for use as the cluster lock, and, in addition, HP recommends that
this LUN comprise the entire disk; that is, the partition should take up the entire disk.
An iSCSI storage device does not support configuring a lock LUN.
The complete path name of the lock LUN is identified in the cluster configuration file.
The operation of the lock LUN is shown in Figure 7.
Serviceguard periodically checks the health of the lock LUN and writes messages to the syslog
file if the disk fails the health check. This file should be monitored for early detection of lock disk
problems.
A quorum server can provide quorum services for multiple clusters. Figure 9 illustrates quorum
server use across four clusters.
IMPORTANT: For more information about the quorum server, see the latest version of the HP
Serviceguard Quorum Server release notes at http://www.hp.com/go/hpux-serviceguard-docs
(Select HP Serviceguard Quorum Server Software).
3.2.11 What Happens when You Change the Quorum Configuration Online
You can change the quorum configuration while the cluster is up and running. This includes changes
to the quorum method (for example, from a lock disk to a quorum server), the quorum device (for
example, from one quorum server to another), and the parameters that govern them (for example,
the quorum server polling interval). For more information about the quorum server and lock
parameters, see Cluster Configuration Parameters (page 90).
When you make quorum configuration changes, Serviceguard goes through a two-step process:
1. All nodes switch to a strict majority quorum (turning off any existing quorum devices).
2. All nodes switch to the newly configured quorum method, device and parameters.
IMPORTANT: During Step 1, while the nodes are using a strict majority quorum, node failures
can cause the cluster to go down unexpectedly if the cluster has been using a quorum device before
the configuration change. For example, suppose you change the quorum server polling interval
while a two-node cluster is running. If a node fails during Step 1, the cluster will lose quorum and
go down, because a strict majority of prior cluster members (two out of two in this case) is required.
The duration of Step 1 is typically around a second, so the chance of a node failure occurring
during that time is very small.
In order to keep the time interval as short as possible, make sure you are changing only the quorum
configuration, and nothing else, when you apply the change.
If this slight risk of a node failure leading to cluster failure is unacceptable, halt the cluster before
you make the quorum configuration change.
3.3.1.2.2 Deciding When and Where to Run and Halt Failover Packages
The package configuration file assigns a name to the package and includes a list of the nodes on
which the package can run.
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with some nodes
using one subnet and some another. This is known as a cross-subnet configuration. In this context,
you can configure packages to fail over from a node on one subnet to a node on another, and
you will need to configure a relocatable IP address for each subnet the package is configured to
start on; see About Cross-Subnet Failover (page 130), and in particular the subsection Implications
for Application Deployment (page 131).
When a package fails over, TCP connections are lost. TCP applications must reconnect to regain
connectivity; this is not handled automatically. Note that if the package is dependent on multiple
subnets, normally all of them must be available on the target node before the package will be
started. (In a cross-subnet configuration, all the monitored subnets that are specified for this package,
and configured on the target node, must be up.)
If the package has a dependency on a resource or another package, the dependency must be met
on the target node before the package can start.
The switching of relocatable IP addresses is shown in the figures that follow. Users connect to each
node with the IP address of the package they wish to use. Each node has a stationary IP address
associated with it, and each package has an IP address associated with it.
In Figure 12, node1 has failed and pkg1 has been transferred to node2. pkg1's IP address was
transferred to node2 along with the package. pkg1 continues to be available and is now running
on node2. Also note that node2 now has access both to pkg1's disk and pkg2's disk.
NOTE: For design and configuration information about clusters that span subnets, see the
documents listed under Cross-Subnet Configurations (page 27).
When the cluster starts, each package starts as shown in Figure 13.
If a failure occurs, the failing package would fail over to the node containing fewest running
packages:
NOTE: Under the min_package_node policy, when node2 is repaired and brought back into
the cluster, it will then be running the fewest packages, and thus will become the new standby
node.
If these packages had been set up using the configured_node failover policy, they would start
initially as in Figure 13, but the failure of node2 would cause the package to start on node3, as
shown in Figure 15.
If you use configured_node as the failover policy, the package will start up on the highest-priority
eligible node in its node list. When a failover occurs, the package will move to the next eligible
node in the list, in the configured order of priority.
node1 panics, and after the cluster reforms, pkgA starts running on node4:
After rebooting, node1 rejoins the cluster. At that point, pkgA will be automatically stopped on
node4 and restarted on node1.
NOTE: You can get or set the status/value of a simple/extended generic resource using the
cmgetresource(1m) and cmsetresource(1m) commands respectively. See Getting and
Setting the Status/Value of a Simple/Extended Generic Resource (page 111) and the manpages
for more information.
A single package can have a combination of simple and extended resources, but a given generic
resource cannot be configured as a simple resource in one package and as an extended resource
in another package. It must be either simple generic resource or extended generic resource in all
packages.
NOTE: If you configure the package while the cluster is running, the package does not start up
immediately after the cmapplyconf command completes. To start the package without halting
and restarting the cluster, issue the cmrunpkg or cmmodpkg command.
How does a failover package start up, and what is its behavior while it is running? Some of the
many phases of package life are shown in Figure 19.
NOTE: This diagram applies specifically to legacy packages. Differences for modular scripts are
called out below.
At any step along the way, an error will result in the script exiting abnormally (with an exit code
of 1). For example, if a package service is unable to be started, the control script will exit with an
error.
NOTE: This diagram is specific to legacy packages. Modular packages also run external scripts
and pre-scripts as explained above.
If the run script execution is not complete before the time specified in the run_script_timeout
parameter (page 177), the package manager will kill the script. During run script execution, messages
are written to a log file. For legacy packages, this is in the same directory as the run script and
has the same name as the run script and the extension.log. For modular packages, the pathname
is determined by the script_log_file parameter in the package configuration file (page 178)).
NOTE: After the package run script has finished its work, it exits, which means that the script is
no longer executing once the package is running normally. After the script exits, the PIDs of the
services started by the script are monitored by the package manager directly. If the service dies,
the package manager will then run the package halt script or, if service_fail_fast_enabled
(page 184) is set to yes, it will halt the node on which the package is running. If a number of restarts
is specified for a service in the package control script, the service may be restarted if the restart
count allows it, without re-running the package run script.
NOTE: If you set <n> restarts and also set service_fail_fast_enabled to yes, the failfast
will take place after <n> restart attempts have failed. It does not make sense to set
service_restart to -R for a service and also set service_fail_fast_enabled to yes.
NOTE: If a package is dependent on a subnet, and the subnet on the primary node fails, the
package will start to shut down. If the subnet recovers immediately (before the package is restarted
on an adoptive node), the package manager restarts the package on the same node; no package
switch occurs.
NOTE: If you use cmhaltpkg command with the -n <nodename> option, the package is
halted only if it is running on that node.
The cmmodpkg command cannot be used to halt a package, but it can disable switching either
on particular nodes or on all nodes. A package can continue running when its switching has been
disabled, but it will not be able to start on other nodes if it stops running on its current node.
At any step along the way, an error will result in the script exiting abnormally (with an exit code
of 1). If the halt script execution is not complete before the time specified in the
halt_script_timeout (page 177) , the package manager will kill the script. During halt script
execution, messages are written to a log file. For legacy packages, this is in the same directory
as the run script and has the same name as the run script and the extension.log. For modular
packages, the pathname is determined by the script_log_file parameter in the package
configuration file (page 178). Normal starts are recorded in the log, together with error messages
or warnings related to halting the package.
NOTE: This diagram applies specifically to legacy packages. Differences for modular scripts are
called out above.
Error or Exit Code Node Failfast Service Linux Status Halt script Package Allowed to Package
Enabled Failfast on Primary runs after Run on Primary Allowed to Run
Enabled after Error Error or Exit Node after Error on Alternate
Node
Service Failure Either Setting Yes system reset No N/A (system reset) Yes
Run Script Exit 1 Either Setting Either Setting Running No Not changed No
Run Script Exit 2 Yes Either Setting system reset No N/A (system reset) Yes
Run Script Yes Either Setting system reset No N/A (system reset) Yes
Timeout
Halt Script Yes Either Setting system reset N/A N/A (system reset) Yes, unless the
Timeout timeout
happened after
the cmhaltpkg
command was
executed.
Service Failure Either Setting Yes system reset No N/A (system reset) Yes
Loss of Network Yes Either Setting system reset No N/A (system reset) Yes
Error or Exit Code Node Failfast Service Linux Status Halt script Package Allowed to Package
Enabled Failfast on Primary runs after Run on Primary Allowed to Run
Enabled after Error Error or Exit Node after Error on Alternate
Node
package Either Setting Either Setting Running Yes Yes when Yes if
depended on dependency is dependency met
failed again met
NOTE: Serviceguard monitors the health of the network interfaces (NICs) and can monitor the
IP level (layer 3) network.
IMPORTANT: Any subnet that is used by a package for relocatable addresses should be
configured into the cluster via NETWORK_INTERFACE and either STATIONARY_IP or
HEARTBEAT_IP in the cluster configuration file. For more information about those parameters,
see Cluster Configuration Parameters (page 90). For more information about configuring
relocatable addresses, see the descriptions of the package ip_ parameters (page 182).
NOTE: It is possible to configure a cluster that spans subnets joined by a router, with some nodes
using one subnet and some another. This is called a cross-subnet configuration. In this context, you
can configure packages to fail over from a node on one subnet to a node on another, and you
will need to configure a relocatable address for each subnet the package is configured to start on;
see About Cross-Subnet Failover (page 130), and in particular the subsection Implications for
Application Deployment (page 131).
CAUTION: HP strongly recommends that you add relocatable addresses to packages only by
editing ip_address (page 183) in the package configuration file (or IP [] entries in the control
script of a legacy package) and running cmapplyconf (1m).
The LANs in the non-bonded configuration have four LAN cards, each associated with a separate
non-aggregated IP address and MAC address, and each with its own LAN name (eth1, eth2,
eth3, or eth4). When these ports are aggregated, all four ports are associated with a single IP
address and MAC address. In this example, the aggregated ports are collectively known as bond0,
and this is the name by which the bond is known during cluster configuration.
Figure 3-18 shows a bonded configuration using redundant hubs with a crossover cable.
Node1 Node2
bond0: bond0:
active active
Hub
Crossover cable
Hub
In the bonding model, individual Ethernet interfaces are slaves, and the bond is the master. In the
basic high availability configuration (mode 1), one slave in a bond assumes an active role, while
the others remain inactive until a failure is detected. (In Figure 3-18, both eth0 slave interfaces are
active.) It is important that during configuration, the active slave interfaces on all nodes are
connected to the same hub. If this were not the case, then normal operation of the LAN would
require the use of the crossover between the hubs and the crossover would become a single point
of failure.
After the failure of a card, messages are still carried on the bonded LAN and are received on the
other node, but now eth1 has become active in bond0 on node1. This situation is shown in
Figure 24.
Various combinations of Ethernet card types (single or dual-ported) and bond groups are possible,
but it is vitally important to remember that at least two physical cards (or physically separate
Inbound failures
Errors that prevent packets from being received but do not affect the link-level health of
an interface
HP recommends that you configure target polling if the subnet is not private to the cluster.
The IP Monitor section of the cmquerycl output looks similar to this:
Route Connectivity (no probing was performed):
IPv4:
1 16.89.143.192
IPv4:
IPv6:
The IP Monitor section of the cluster configuration file will look similar to the following for a subnet
on which IP monitoring is configured with target polling.
IMPORTANT: By default, the cmquerycl does not verify that the gateways it detects will work
correctly for monitoring. But if you use the -w full option, cmquerycl will validate them as
polling targets.
SUBNET 192.168.1.0
IP_MONITOR ON
POLLING_TARGET 192.168.1.254
By default, IP_MONITOR parameter is set to OFF. If a gateway is detected for the subnet in question,
it populates the POLLING_TARGET , which is commented out, and sets the IP_MONITOR parameter
to OFF.
SUBNET 192.168.1.0
IP_MONITOR OFF
#POLLING_TARGET 192.168.1.254
To configure a subnet for IP monitoring with peer polling, edit the IP Monitor section of the cluster
configuration file to look similar to this:
SUBNET 192.168.2.0
IP_MONITOR ON
The IP Monitor section of the cluster configuration file will look similar to the following in the case
of a subnet on which IP monitoring is disabled:
SUBNET 192.168.3.0
IP_MONITOR OFF
NOTE: LUN definition is normally done using utility programs provided by the disk array
manufacturer. Since arrays vary considerably, you should refer to the documentation that
accompanies your storage unit.
For information about configuring multipathing, see Multipath for Storage (page 82).
NOTE: Persistent Reservations coexist with, and are independent of, activation protection of
volume groups. You should continue to configure activation protection as instructed under Enabling
Volume Group Activation Protection. Subject to the Rules and Limitations spelled out below, Persistent
Reservations will be applied to the cluster's LUNs, whether or not the LUNs are configured into
volume groups.
Advantages of PR are:
Consistent behavior.
Whereas different volume managers may implement exclusive activation differently (or not at
all) PR is implemented at the device level and does not depend on volume-manager support
for exclusive activation.
Packages can control access to LUN devices independently of a volume manager.
Serviceguard's support for the ASM manager allows packages whose applications use these
protocols to access storage devices directly, without using a volume manager.
Persistent Reservation (PR) module pr_cntl internally uses the dmsetup info command
to probe /dev/dm-n devices. In SUSE Linux Enterprise Server 11 SP1, the dmsetup
info command does not support probing /dev/dm-n devices. Therefore, the features
that use /dev/dm-n, such as udev alias cannot be supported with PR. If you are using
udev aliases in serviceguard-xdc environment, then PR module cannot be supported with
that configuration.
NOTE: This restriction is applicable only on SUSE Linux Enterprise Server 11 SP1.
If you are not using the udev alias names, multipath physical volumes names must be in
the /dev/mapper/XXXX or /dev/mpath/XXXX format.
The udev alias names must not be configured in the /dev/mapper/ or/dev/mpath/
directory.
Multipath device alias names must not contain pN or _partN strings, where N is the
number.
For example, /dev/mapper/evadskp1 or /dev/mapper/evadsk_part1
If you accidently run the pr_cleanup command on LUNs belonging to a package that is
already running, PR protection is disabled. To enable PR protection, you must restart the
package.
All instances of a modular multi-node package must be able to use PR; otherwise it will
be turned off for all instances.
The package must have access to real devices, not only virtualized ones.
CAUTION: Serviceguard makes and revokes registrations and reservations during normal package
startup and shutdown, or package failover. Serviceguard also provides a script to clear reservations
in the event of a catastrophic cluster failure. You need to make sure that this script is run in that
case; the LUN devices could become unusable otherwise. See Revoking Persistent Reservations
after a Catastrophic Failure (page 257) for more information.
NOTE: If a simple resource is down on a particular node, it is down on that node for all the
packages using it whereas, in case of an extended resource the resource may be up on a node
for a particular package and down for another package, since it is dependent on the
generic_resource_up_criteria.
Additionally, in a running package configured with a generic resource:
Any failure of a generic resource of evaluation type "before_package_start" configured in a
package will not disable the node switching for the package.
Any failure of a generic resource of evaluation type "during_package_start" configured in a
package will disable the node switching for the package.
NOTE: Planning and installation overlap considerably, so you may not be able to complete the
worksheets before you proceed to the actual configuration. In that case, fill in the missing elements
to document the system as you proceed with the configuration.
Subsequent chapters describe configuration and maintenance tasks in detail.
NOTE: This configuration is not recommended because failure of the host brings down all
the nodes in the cluster which is a single point of failure.
Cluster with VMware or KVM guests from multiple hosts as cluster nodes
Cluster with VMware or KVM guests and physical machines as cluster nodes
NOTE:
Guests running on different Hypervisor (VMware or KVM guests) must not be configured as
cluster nodes in the same cluster.
Cluster with VMware from a single host as cluster nodes configuration must be avoided in
serviceguard-xdc environment. For more information about serviceguard-xdc support with
VMware virtual machines, see HP Serviceguard Extended Distance Cluster for Linux A.11.20.20
Deployment Guide.
KVM guests cannot be used as cluster nodes in the serviceguard-xdc environment.
For more information about how to integrate VMware and KVM guests as Serviceguard cluster
nodes, see the following white paper at http://www.hp.com/go/linux-serviceguard-docs:
Using HP Serviceguard for Linux with VMware Virtual Machines
Using HP Serviceguard for Linux with Red Hat KVM Guests
4.3.3.1 FibreChannel
FibreChannel cards can be used to connect up to 16 nodes to a disk array containing storage.
After installation of the cards and the appropriate driver, the LUNs configured on the storage unit
are presented to the operating system as device files, which can be used to build LVM volume
groups.
NOTE: Multipath capabilities are supported by FibreChannel HBA device drivers and the Linux
Device Mapper. Check with the storage device documentation for details.
See also Multipath for Storage .
4.3.3.2 iSCSI
You can use the storage link based on IP to connect up to 16 nodes to a disk array containing
storage. The LUNs configured on the storage unit are presented to the operating system as device
files, which can be used to build LVM volume groups.
NOTE: Configuring multiple paths from different networks to the iSCSI LUN is not supported.
You can use the worksheet to record the names of the device files that correspond to each LUN
for the Fibre-Channel-attached and iSCSI attached storage unit.
NOTE: With the rapid evolution of Linux, the multipath mechanisms may change, or new ones
may be added. Serviceguard for Linux supports DeviceMapper multipath (DM-MPIO) with some
restrictions; see the Serviceguard for Linux Certification Matrix at the address provided in the
Preface to this manual for up-to-date information.
NOTE: md also supports software RAID; but this configuration is not currently supported with
Serviceguard for Linux.
Disk Device File Enter the disk device file name for each SCSI disk or LUN.
This information is needed when you create the mirrored disk configuration using LVM. In addition,
it is useful to gather as much information as possible about your disk configuration.
You can obtain information about available disks by using the following commands; your system
may provide other utilities as well.
ls /dev/sd* (Smart Array cluster storage)
ls /dev/hd* (non-SCSI/FibreChannel disks)
ls /dev/sd* (SCSI and FibreChannel disks)
du
df
mount
vgdisplay -v
lvdisplay -v
See the manpages for these commands for information about specific usage. The commands should
be issued from all nodes after installing the hardware and rebooting the system. The information
will be useful when doing LVM and cluster configuration.
NOTE:
You cannot use more than one type of lock in the same cluster.
An iSCSI storage device does not support configuring a lock LUN.
IMPORTANT: If you plan to use a Quorum Server, make sure you read the HP Serviceguard
Quorum Server Version A.04.00 Release Notes before you proceed. You can find them at http://
www.hp.com/go/hpux-serviceguard-docs (Select HP Serviceguard Quorum Server Software).. You
should also consult the Quorum Server white papers at the same location.
NOTE: HP recommends that you use volume group names other than the default volume group
names (vg01, vg02, etc.). Choosing volume group names that represent the high availability
applications they are associated with (For example, /dev/vgdatabase) will simplify cluster
administration.
NOTE: After you run the cmpreparecl script, you can start the cluster configuration.
Advantages
Simple ways to configure the system before you create a cluster.
Configuration for all the nodes can be done from one of the nodes in the cluster.
Limitations
All the nodes that are part of the cluster must be known before hand.
IMPORTANT: The nodes which are given as inputs should not have cluster configured in them.
Before you start, you should have done the planning and preparation as described in previous
sections. You must also do the following:
Install Serviceguard on each node that is to be configured into the cluster; see Installing and
Updating Serviceguard (page 135).
You must have superuser capability on each node.
Make sure all the nodes have access to at least one fully configured network.
Make sure all the subnets used by the prospective nodes are accessible to all the nodes.
NOTE: The modified files are backed up in the same directory as the original files with
".original" extension and the output is logged to the /tmp/cmpreparecl.log file. This log
file is a cumulative log of the configuration done on the node. Each time you run
cmpreparecl, logs are appended with appropriate time stamp.
For more information, and other options, see manpages for cmpreparecl (1m).
NOTE: For heartbeat configuration requirements, see the discussion of the HEARTBEAT_IP
parameter later in this chapter. For more information about managing the speed of cluster
re-formation, see the discussion of the MEMBER_TIMEOUT parameter, and further discussion under
What Happens when a Node Times Out (page 75), and, for troubleshooting, Cluster
Re-formations Caused by MEMBER_TIMEOUT Being Set too Low (page 264).
4.7.3 About Hostname Address Families: IPv4-Only, IPv6-Only, and Mixed Mode
Serviceguard supports three possibilities for resolving the nodes' hostnames (and Quorum Server
hostnames, if any) to network address families:
IPv4-only
IPv6-only
Mixed
IPv4-only means that Serviceguard will try to resolve the hostnames to IPv4 addresses only.
NOTE: This applies only to hostname resolution. You can have IPv6 heartbeat and data LANs
no matter what the HOSTNAME_ADDRESS_FAMILY parameter is set to. (IPv4 heartbeat and data
LANs are allowed in IPv4 and mixed mode.)
NOTE: How the clients of IPv6-only cluster applications handle hostname resolution is a matter
for the discretion of the system or network administrator; there are no HP requirements or
recommendations specific to this case.
In IPv6-only mode, all Serviceguard daemons will normally use IPv6 addresses for communication
among the nodes, although local (intra-node) communication may occur on the IPv4 loopback
address.
For more information about IPv6, see Appendix D (page 291).
NOTE: This also applies if HOSTNAME_ADDRESS_FAMILY is set to ANY; Red Hat 5 supports
only IPv4-only clusters.
All addresses used by the cluster must be in each node's /etc/hosts file. In addition, the
file must contain the following entry:
::1 localhost ipv6-localhost ipv6-loopback
For more information and recommendations about hostname resolution, see Configuring
Name Resolution (page 137).
All addresses must be IPv6, apart from the node's IPv4 loopback address, which cannot be
removed from /etc/hosts.
The node's public LAN address (by which it is known to the outside world) must be the last
address listed in /etc/hosts.
Otherwise there is a possibility of the address being used even when it is not configured into
the cluster.
You must use $SGCONF/cmclnodelist, not ~/.rhosts or /etc/hosts.equiv, to
provide root access to an unconfigured node.
If you use a Quorum Server, you must make sure that the Quorum Server hostname (and the
alternate Quorum Server address specified by QS_ADDR, if any) resolve to IPv6 addresses,
and you must use Quorum Server version A.04.00 or later. See the latest Quorum Server
release notes for more information; you can find them at http://www.hp.com/go/
linux-serviceguard-docs.
NOTE: The Quorum Server itself can be an IPv6only system; in that case it can serve
IPv6only and mixed-mode clusters, but not IPv4only clusters.
If you use a Quorum Server, and the Quorum Server is on a different subnet from cluster, you
must use an IPv6-capable router.
Hostname aliases are not supported for IPv6 addresses, because of operating system limitations.
NOTE: This also applies if HOSTNAME_ADDRESS_FAMILY is set to IPv6; Red Hat 5 supports
only IPv4-only clusters.
The hostname resolution file on each node (for example, /etc/hosts) must contain entries
for all the IPv4 and IPv6 addresses used throughout the cluster, including all STATIONARY_IP
and HEARTBEAT_IP addresses as well any private addresses. There must be at least one
IPv4 address in this file (in the case of /etc/hosts, the IPv4 loopback address cannot be
removed). In addition, the file must contain the following entry:
::1 localhost ipv6-localhost ipv6-loopback
For more information and recommendations about hostname resolution, see Configuring
Name Resolution (page 137).
You must use $SGCONF/cmclnodelist, not ~/.rhosts or /etc/hosts.equiv, to
provide root access to an unconfigured node.
See Allowing Root Access to an Unconfigured Node (page 136) for more information.
Hostname aliases are not supported for IPv6 addresses, because of operating system limitations.
NOTE: See Reconfiguring a Cluster (page 225) for a summary of changes you can make while
the cluster is running.
The following parameters must be configured:
CLUSTER_NAME The name of the cluster as it will appear in the output of
cmviewcl and other commands, and as it appears in the
cluster configuration file.
The cluster name must not contain any of the following
characters: space, slash (/), backslash (\), and asterisk (*).
All other characters are legal. The cluster name can contain
up to 39 characters.
NETWORK_INTERFACE The name of each LAN that will be used for heartbeats or
for user data on the node identified by the preceding
NODE_NAME. An example is eth0. See also
HEARTBEAT_IP, STATIONARY_IP, and About Hostname
Address Families: IPv4-Only, IPv6-Only, and Mixed Mode
(page 88).
STATIONARY_IP This node's IP address on each subnet that does not carry
the cluster heartbeat, but is monitored for packages.
1 ~ NPI x 8 - NPI x 9
2 ~ NPI x 4 - NPI x 5
3 ~ NPI x 3 - NPI x 4
4 to 8 ~ NPI x 2 - NPI x 3
NOTE: CONFIGURED_IO_TIMEOUT_EXTENSION
is supported only with iFCP switches that allow you to
get their R_A_TOV value.
NOTE: As of Serviceguard A.11.18, there is a new and simpler way to configure packages.
This method allows you to build packages from smaller modules, and eliminates the separate
package control script and the need to distribute it manually; see Chapter 6: Configuring Packages
and Their Services (page 169), for complete instructions.
This manual refers to packages created by the newer method as modular packages, and to packages
created by the older method as legacy packages.
The discussion that follows assumes you will be using the modular method. For information and
instructions on creating and maintaining legacy packages, see Configuring a Legacy Package
(page 233).
The document HP Serviceguard Developers Toolbox User Guide, December 2012 provides a
guide for integrating an application with Serviceguard using a suite of customizable scripts known
as "Serviceguard Developers Toolbox" intended for use with modular packages only. The
Serviceguard Developers Toolbox is available free of charge and can be downloaded from
Software Depot at http://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?
productNumber=SGDTOOLBOX.
NOTE: To prevent an operator from accidentally activating volume groups on other nodes in the
cluster, versions A.11.16.07 and later of Serviceguard for Linux include a type of VG activation
protection. This is based on the hosttags feature of LVM2.
This feature is not mandatory, but HP strongly recommends you implement it as you upgrade
existing clusters and create new ones. See Enabling Volume Group Activation Protection (page 148)
for instructions. However, if you are using PR feature this step is not required.
NOTE: Generic resources influence the package based on their status. The actual monitoring
of the resource should be done in a script and this must be configured as a service. The script
sets the status of the resource based on the availability of the resource. See Monitoring Script
for Generic Resources (page 303).
Create a list by package of volume groups, logical volumes, and file systems. Indicate which nodes
need to have access to common file systems at different times.
HP recommends that you use customized logical volume names that are different from the default
logical volume names (lvol1, lvol2, etc.). Choosing logical volume names that represent the
high availability applications that they are associated with (for example, lvoldatabase) will
simplify cluster administration.
To further document your package-related volume groups, logical volumes, and file systems on
each node, you can add commented lines to the /etc/fstab file. The following is an example
for a database application:
# /dev/vg01/lvoldb1 /applic1 ext3 defaults 0 1 # These six entries are
# /dev/vg01/lvoldb2 /applic2 ext3 defaults 0 1 # for information purposes
# /dev/vg01/lvoldb3 raw_tables ignore ignore 0 0 # only. They record the
# /dev/vg01/lvoldb4 /general ext3 defaults 0 2 # logical volumes that
CAUTION: Do not use /etc/fstab to mount file systems that are used by Serviceguard packages.
For information about creating, exporting, and importing volume groups, see Creating the Logical
Volume Infrastructure (page 145).
NOTE: Ensure that the NFS module is loaded during boot time for the configurations using NFS
file systems as part of the package configuration.
The following rules and restrictions apply.
NFS mounts are supported for modular failover packages.
So that Serviceguard can ensure that all I/O from a node on which a package has failed is
flushed before the package restarts on an adoptive node, all the network switches and routers
between the NFS server and client must support a worst-case timeout, after which packets and
frames are dropped. This timeout is known as the Maximum Bridge Transit Delay (MBTD).
IMPORTANT: Find out the MBTD value for each affected router and switch from the vendors'
documentation; determine all of the possible paths; find the worst case sum of the MBTD values
on these paths; and use the resulting value to set the Serviceguard
CONFIGURED_IO_TIMEOUT_EXTENSION parameter. For instructions, see the discussion of
this parameter under Cluster Configuration Parameters (page 90).
Switches and routers that do not support MBTD value must not be used in a Serviceguard NFS
configuration. This might lead to delayed packets that in turn could lead to data corruption.
Networking among the Serviceguard nodes must be configured in such a way that a single
failure in the network does not cause a package failure.
Only NFS client-side locks (local locks) are supported.
Server-side locks are not supported.
Because exclusive activation is not available for NFS-imported file systems, you must take the
following precautions to ensure that data is not accidentally overwritten.
The server must be configured so that only the cluster nodes have access to the file system.
The NFS file system used by a package must not be imported by any other system,
including other nodes in the cluster.
The nodes should not mount the file system on boot; it should be mounted only as part of
the startup for the package that uses it.
The NFS file system should be used by only one package.
While the package is running, the file system should be used exclusively by the package.
If the package fails, do not attempt to restart it manually until you have verified that the
file system has been unmounted properly.
NOTE: If network connectivity to the NFS Server is lost, the applications using the imported
file system may hang and it may not be possible to kill them. If the package attempts to halt
at this point, it may not halt successfully.
Package fails over to the node with the fewest failover_policy set to min_package_node.
active packages.
Package fails over to the node that is next on failover_policy set to configured_node. (Default)
the list of nodes. (Default)
All packages switch following a system reboot service_fail_fast_enabled set to yes for a specific service.
on the node when a specific service fails. Halt
scripts are not run. auto_run set to yes for all packages.
All packages switch following a system reboot service_fail_fast_enabled set to yes for all services.
on the node when any service fails.
auto_run set to yes for all packages.
All packages switch following a system reset service_fail_fast_enabled set to yes for a specific service.
(an immediate halt without a graceful
shutdown) on the node when a specific auto_run set to yes for all packages.
service fails. Halt scripts are not run.
All packages switch following a system reset service_fail_fast_enabled set to yes for all services.
on the node when any service fails. An
attempt is first made to reboot the system prior auto_run set to yes for all packages.
to the system reset.
Failover packages can be also configured so that IP addresses switch from a failed NIC to a
standby NIC on the same node and the same physical subnet.
NOTE: To generate a configuration file adding the generic resource module to an existing
package (enter the command all on one line):
cmmakepkg -i $SGCONF/pkg1/pkg1.conf -m sg/generic_resource
2. Edit the package configuration file and specify the generic resource parameters (as shown in
the snippet):
service_name cpu_monitor
service_cmd $SGCONF/generic_resource_monitors/cpu_monitor.sh
service_halt_timeout 10
generic_resource_name sfm_cpu
generic_resource_evaluation_type during_package_start
3. After editing the package configuration file, verify the content of the package configuration
file:
cmcheckconf -v -P $SGCONF/pkg1/pkg1.conf
cmcheckconf: Verification completed with no errors found.
Use the cmapplyconf command to apply the configuration
4. When verification completes without errors, apply the package configuration file. This adds
the package configuration information (along with generic resources) to the binary cluster
configuration file in the $SGCONF directory and distributes it to all the cluster nodes.
cmapplyconf -P $SGCONF/pkg1/pkg1.conf
Modify the package configuration ([y]/n)? y
Completed the cluster update
5. Verify that the generic resources parameters are configured.
cmviewcl -v -p pkg1
UNOWNED_PACKAGES
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Generic Resource unknown node1 sfm_disk
Generic Resource unknown node2 sfm_disk
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled node1
Alternate up enabled node2
Other_Attributes:
NOTE: The default status of a generic resource is UNKNOWN and the default current_value
is "0" unless the status/value of a simple/extended generic resource is set using the
cmsetresource command.
6. Start the package. As part of the package start, the monitoring script will start the monitoring
of the generic resource and set the status accordingly.
cmrunpkg pkg1
NOTE: pkg1 can depend on more than one other package, and pkg2 can depend on another
package or packages; we are assuming only two packages in order to make the rules as clear as
possible.
NOTE: If pkg1 lists all the nodes, rather than using the asterisk (*), pkg2 must also
list them.
Preferably the nodes should be listed in the same order if the dependency is between
packages whose failover_policy is configured_node; cmcheckconf and
cmapplyconf will warn you if they are not.
A package cannot depend on itself, directly or indirectly.
That is, not only must pkg1 not specify itself in the dependency_condition (page 180), but
pkg1 must not specify a dependency on pkg2 if pkg2 depends on pkg1, or if pkg2 depends
on pkg3 which depends on pkg1, etc.
If pkg1 is a failover package and pkg2 is a multi-node or system multi-node package, and
pkg2 fails, pkg1 will halt and fail over to the next node on its node_name list on which pkg2
is running (and any other dependencies, such as resource dependencies or a dependency on
a third package, are met).
In the case of failover packages with a configured_node failover_policy, a set of
rules governs under what circumstances pkg1 can force pkg2 to start on a given node. This
is called dragging and is determined by each packages priority (page 179). See Dragging
Rules for Simple Dependencies (page 115).
If pkg2 fails, Serviceguard will halt pkg1 and any other packages that depend directly or
indirectly on pkg2.
By default, Serviceguard halts packages in dependency order, the dependent package(s) first,
then the package depended on. In our example, pkg1 would be halted first, then pkg2. If
there were a third package, pkg3, that depended on pkg1, pkg3 would be halted first, then
pkg1, then pkg2.
If the halt script for any dependent package hangs, by default the package depended on will
wait forever (pkg2 will wait forever for pkg1, and if there is apkg3 that depends on pkg1,
pkg1 will wait forever for pkg3). You can modify this behavior by means of the
successor_halt_timeout parameter (page 178)). (The successor of a package depends
on that package; in our example, pkg1 is a successor of pkg2; conversely pkg2 can be
referred to as a predecessor of pkg1.)
NOTE: This applies only when the packages are automatically started (package switching
enabled); cmrunpkg will never force a package to halt.
Keep in mind that you do not have to set priority, even when one or more packages depend
on another. The default value, no_priority, may often result in the behavior you want. For
example, if pkg1 depends on pkg2, and priority is set to no_priority for both packages,
and other parameters such as node_name and auto_run are set as recommended in this section,
then pkg1 will normally follow pkg2 to wherever both can run, and this is the common-sense (and
may be the most desirable) outcome.
The following examples express the rules as they apply to two failover packages whose
failover_policy (page 178) is configured_node. Assume pkg1 depends on pkg2, that
node1, node2 and node3 are all specified (in some order) under node_name (page 176) in the
configuration file for each package, and that failback_policy (page 179) is set to automatic
for each package.
NOTE: Keep the following in mind when reading the examples that follow, and when actually
configuring priorities:
1. auto_run (page 176) should be set to yes for all the packages involved; the examples assume
that it is.
2. Priorities express a ranking order, so a lower number means a higher priority (10 is a higher
priority than 30).
HP recommends assigning values in increments of 20 so as to leave gaps in the sequence;
otherwise you may have to shuffle all the existing priorities when assigning priority to a new
package.
no_priority, the default, is treated as a lower priority than any numerical value.
3. All packages with no_priority are by definition of equal priority, and there is no other
way to assign equal priorities; a numerical priority must be unique within the cluster. See
priority (page 179) for more information.
On failback:
If both packages have moved from node1 to node2 and node1 becomes available,
pkg2 will fail back to node1 only if pkg2s priority is higher than pkg1s:
If the priorities are equal, neither package will fail back (unless pkg1 is not running;
in that case pkg2 can fail back).
If pkg2s priority is higher than pkg1s, pkg2 will fail back to node1; pkg1 will
fail back to node1 provided all of pkg1s other dependencies are met there;
if pkg2 has failed back to node1 and node1 does not meet all of pkg1s
dependencies, pkg1 will halt.
If pkg1 depends on pkg2, and pkg1s priority is higher than pkg2s, pkg1s node order dominates.
Assuming pkg1s node order is node1, node2, node3, then:
On startup:
pkg1 will select node1 to start on.
pkg2 will start on node1, provided it can run there (no matter where node1 appears
on pkg2s node_name list).
If pkg2 is already running on another node, it will be dragged to node1, provided
it can run there.
If pkg2 cannot start on node1, then both packages will attempt to start on node2 (and
so on).
IMPORTANT: If you have not already done so, read the discussion of Simple Dependencies
(page 113) before you go on.
The interaction of the legal values of dependency_location and dependency_condition
creates the following possibilities:
Same-node dependency: a package can require that another package be UP on the same
node.
This is the case covered in the section on Simple Dependencies (page 113).
Different-node dependency: a package can require that another package be UP on a different
node.
Any-node dependency: a package can require that another package be UP on any node in
the cluster.
Same-node exclusion: a package can require that another package be DOWN on the same
node. (But this does not prevent that package from being UP on another node.)
All-nodes exclusion: a package can require that another package be DOWN on all nodes in
the cluster.
Dragging rules apply. See Dragging Rules for Simple Dependencies (page 115).
NOTE: Dependent packages are halted even in the case of different_node or any_node
dependency. For example, if pkg1 running on node1 has a different_node or any_node
dependency on pkg2 running on node2, and pkg2 fails over to node3, pkg1 will be halted
and restarted as described below.
By default the packages are halted in the reverse of the order in which they were started; and
if the halt script for any of the dependent packages hangs, the failed package will wait
indefinitely to complete its own halt process. This provides the best chance for all the dependent
packages to halt cleanly, but it may not be the behavior you want. You can change it by
means of the successor_halt_timeout parameter (page 178). (A successor is a package
that depends on another package.)
If the failed package's successor_halt_timeout is set to zero, Serviceguard will halt the
dependent packages in parallel with the failed package; if it is set to a positive number,
Serviceguard will halt the packages in the reverse of the start order, but will allow the failed
package to halt after the successor_halt_timeout number of seconds whether or not
the dependent packages have completed their halt scripts.
2. Halts the failing package.
After the successor halt timer has expired or the dependent packages have all halted,
Serviceguard starts the halt script of the failing package, regardless of whether the dependents'
halts succeeded, failed, or timed out.
3. Halts packages the failing package depends on, starting with the package this package
immediately depends on. The packages are halted only if:
4.8.10.3.1 Example 1
For example, to configure a node to run a maximum of ten packages at any one time, make the
following entry under the node's NODE_NAME entry in the cluster configuration file:
NODE_NAME node1
...
CAPACITY_NAME package_limit
CAPACITY_VALUE 10
Now all packages will be considered equal in terms of their resource consumption, and this node
will never run more than ten packages at one time. (You can change this behavior if you need to
by modifying the weight for some or all packages, as the next example shows.) Next, define the
CAPACITY_NAME and CAPACITY_VALUE parameters for the remaining nodes, setting
CAPACITY_NAME to package_limit in each case. You may want to set CAPACITY_VALUE to
different values for different nodes. A ten-package capacity might represent the most powerful
node, for example, while the least powerful has a capacity of only two or three.
NOTE: Serviceguard does not require you to define a capacity for each node. If you define the
CAPACITY_NAME and CAPACITY_VALUE parameters for some nodes but not for others, the nodes
for which these parameters are not defined are assumed to have limitless capacity; in this case,
those nodes would be able to run any number of eligible packages at any given time.
If some packages consume more resources than others, you can use the weight_name and
weight_value parameters to override the default value (1) for some or all packages. For example,
suppose you have three packages, pkg1, pkg2, and pkg3. pkg2 is about twice as
resource-intensive as pkg3 which in turn is about one-and-a-half times as resource-intensive as
pkg1. You could represent this in the package configuration files as follows:
For pkg1:
weight_name package_limit
weight_value 2
For pkg2:
weight_name package_limit
weight_value 6
For pkg3:
weight_name package_limit
weight_value 3
Now node1, which has a CAPACITY_VALUE of 10 for the reserved CAPACITY_NAME
package_limit, can run any two of the packages at one time, but not all three. If in addition
IMPORTANT: You cannot combine the two methods. If you use the reserved capacity
package_limit for any node, Serviceguard will not allow you to define any other type of
capacity and weight in this cluster; so you are restricted to the Simple Method in that case.
4.8.10.4.1.1 Example 2
To define these capacities, and set limits for individual nodes, make entries such as the following
in the cluster configuration file:
CLUSTER_NAME cluster_23
...
NODE_NAME node1
...
CAPACITY_NAME A
CAPACITY_VALUE 80
CAPACITY_NAME B
CAPACITY_VALUE 50
NODE_NAME node2
CAPACITY_NAME A
CAPACITY_VALUE 60
CAPACITY_NAME B
CAPACITY_VALUE 70
...
NOTE: There is one exception: system multi-node packages cannot have weight, so a cluster-wide
default weight does not apply to them.
4.8.10.4.2.1.1 Example 3
WEIGHT_NAME A
WEIGHT_DEFAULT 20
WEIGHT_NAME B
WEIGHT_DEFAULT 15
This means that any package for which weight A is not defined in its package configuration file
will have a weight A of 20, and any package for which weight B is not defined in its package
configuration file will have a weight B of 15.
Given the capacities we defined in the cluster configuration file (see Defining Capacities), node1
can run any three packages that use the default for both A and B. This would leave 20 units of
spare A capacity on this node, and 5 units of spare B capacity.
Pursuing the example started under Defining Capacities (page 122), we can now use options 1
and 2 to set weights for pkg1 through pkg4.
4.8.10.4.2.2.1 Example 4
In pkg1's package configuration file:
weight_name A
weight_value 60
In pkg2's package configuration file:
weight_name A
weight_value 40
In pkg3's package configuration file:
weight_name B
weight_value 35
weight_name A
weight_value 0
In pkg4's package configuration file:
weight_name B
weight_value 40
IMPORTANT: weight_name in the package configuration file must exactly match the
corresponding CAPACITY_NAME in the cluster configuration file. This applies to case as well as
spelling: weight_name a would not match CAPACITY_NAME A.
You cannot define a weight unless the corresponding capacity is defined: cmapplyconf will fail
if you define a weight in the package configuration file and no node in the package's node_name
list (page 176) has specified a corresponding capacity in the cluster configuration file; or if you
define a default weight in the cluster configuration file and no node in the cluster specifies a
capacity of the same name.
Similarly, if any package that has A weight is already running on node2, pkg1 will not be
able to start there (unless pkg1 has sufficient priority to force another package or packages
NOTE: But if you use the reserved CAPACITY_NAME package_limit, you can define
only that single capacity and corresponding weight. See Simple Method (page 121).
Node capacity is defined in the cluster configuration file, via the CAPACITY_NAME and
CAPACITY_VALUE parameters.
Capacities can be added, changed, and deleted while the cluster is running. This can cause
some packages to be moved, or even halted and not restarted.
Package weight can be defined in cluster configuration file, via the WEIGHT_NAME and
WEIGHT_DEFAULT parameters, or in the package configuration file, via the weight_name
and weight_value parameters, or both.
Weights can be assigned (and WEIGHT_DEFAULTs, apply) only to multi-node packages and
to failover packages whose failover_policy (page 178) is configured_node and whose
failback_policy (page 179) is manual.
If you define weight (weight_name and weight_value) for a package, make sure you
define the corresponding capacity (CAPACITY_NAME and CAPACITY_VALUE) in the cluster
configuration file for at least one node on the package's node_name list (page 176). Otherwise
cmapplyconf will fail when you try to apply the package.
Weights (both cluster-wide WEIGHT_DEFAULTs, and weights defined in the package
configuration files) can be changed while the cluster is up and the packages are running. This
can cause some packages to be moved, or even halted and not restarted.
4.8.10.7 How Package Weights Interact with Package Priorities and Dependencies
If necessary, Serviceguard will halt a running lower-priority package that has weight to make room
for a higher-priority package that has weight. But a running package that has no priority (that is,
4.8.10.7.1 Example 1
pkg1 is configured to run on nodes turkey and griffon. It has a weight of 1 and a priority
of 10. It is down and has switching disabled.
pkg2 is configured to run on nodes turkey and griffon. It has a weight of 1 and a priority
of 20. It is running on node turkey and has switching enabled.
turkey and griffon can run one package each (package_limit is set to 1).
If you enable switching for pkg1, Serviceguard will halt the lower-priority pkg2 on turkey. It
will then start pkg1 on turkey and restart pkg2 on griffon.
If neither pkg1 nor pkg2 had priority, pkg2 would continue running on turkey and pkg1 would
run on griffon.
4.8.10.7.2 Example 2
pkg1 is configured to run on nodes turkey and griffon. It has a weight of 1 and a priority
of 10. It is running on node turkey and has switching enabled.
pkg2 is configured to run on nodes turkey and griffon. It has a weight of 1 and a priority
of 20. It is running on node turkey and has switching enabled.
pkg3 is configured to run on nodes turkey and griffon. It has a weight of 1 and a priority
of 30. It is down and has switching disabled.
pkg3 has a same_node dependency on pkg2
turkey and griffon can run two packages each (package_limit is set to 2).
If you enable switching for pkg3, it will stay down because pkg2, the package it depends on, is
running on node turkey, which is already running two packages (its capacity limit). pkg3 has
a lower priority than pkg2, so it cannot drag it to griffon where they both can run.
NOTE: In the case of the validate entry point, exit values 1 and 2 are treated the same; you
can use either to indicate that validation failed.
The script can make use of a standard set of environment variables (including the package name,
SG_PACKAGE, and the name of the local node, SG_NODE) exported by the package manager or
the master control script that runs the package; and can also call a function to source in a logging
function and other utility functions. One of these functions, sg_source_pkg_env(), provides
access to all the parameters configured for this package, including package-specific environment
variables configured via the pev_ parameter (page 190).
NOTE: Some variables, including SG_PACKAGE, and SG_NODE, are available only at package
run and halt time, not when the package is validated. You can use SG_PACKAGE_NAME at validation
time as a substitute for SG_PACKAGE.
For more information, see the template in $SGCONF/examples/external_script.template.
A sample script follows. It assumes there is another script called monitor.sh, which will be
configured as a Serviceguard service to monitor some application. The monitor.sh script (not
included here) uses a parameter PEV_MONITORING_INTERVAL, defined in the package
configuration file, to periodically poll the application it wants to monitor; for example:
PEV_MONITORING_INTERVAL 60
At validation time, the sample script makes sure the PEV_MONITORING_INTERVAL and the
monitoring service are configured properly; at start and stop time it prints out the interval to the
log file.
#!/bin/sh
# Source utility functions.
if [[ -z $SG_UTILS ]]
then
. $SGCONF.conf
SG_UTILS=$SGCONF/scripts/mscripts/utils.sh
fi
function validate_command
{
typeset -i ret=0
typeset -i i=0
typeset -i found=0
# check PEV_ attribute is configured and within limits
if [[ -z PEV_MONITORING_INTERVAL ]]
then
sg_log 0 "ERROR: PEV_MONITORING_INTERVAL attribute not configured!"
ret=1
elif (( PEV_MONITORING_INTERVAL < 1 ))
then
sg_log 0 "ERROR: PEV_MONITORING_INTERVAL value ($PEV_MONITORING_INTERVAL) not within legal limits!"
ret=1
fi
# check monitoring service we are expecting for this package is configured
function start_command
{ sg_log 5 "start_command"
function stop_command
{
sg_log 5 "stop_command"
# log current PEV_MONITORING_INTERVAL value, PEV_ attribute can be changed
# while the package is running
sg_log 0 "PEV_MONITORING_INTERVAL for $SG_PACKAGE_NAME is $PEV_MONITORING_INTERVAL"
return 0
}
typeset -i exit_val=0
case ${1} in
start)
start_command $*
exit_val=$?
;;
stop)
stop_command $*
exit_val=$?
;;
validate)
validate_command $*
exit_val=$?
;;
*)
sg_log 0 "Unknown entry point $1"
;;
esac
exit $exit_val
NOTE: cmhalt operations interact with all the packages and should not be used from external
scripts.
NOTE: last_halt_failed appears only in the line output of cmviewcl, not the default
tabular format; you must use the -f line option to see it.
The value of last_halt_failed is no if the halt script ran successfully, or has not run since the
node joined the cluster, or has not run since the package was configured to run on the node;
otherwise it is yes.
As in other cluster configurations, a package will not start on a node unless the subnets
configured on that node, and specified in the package configuration file as monitored
subnets, are up.
NOTE: This section provides an example for a modular package; for legacy packages, see
Configuring Cross-Subnet Failover (page 239).
Suppose that you want to configure a package, pkg1, so that it can fail over among all the nodes
in a cluster comprising NodeA, NodeB, NodeC, and NodeD.
NodeA and NodeB use subnet 15.244.65.0, which is not used by NodeC and NodeD; and
NodeC and NodeD use subnet 15.244.56.0, which is not used by NodeA and NodeB. (See
Obtaining Cross-Subnet Information (page 156) for sample cmquerycl output).
NOTE: For more information and advice, see the white paper Securing Serviceguard at http://
www.hp.com/go/hpux-serviceguard-docs (Select HP Serviceguard -> White Papers).
NOTE: When you upgrade a cluster from Version A.11.15 or earlier, entries in
$SGCONF/cmclnodelist are automatically updated to Access Control Policies in the cluster
configuration file. All non-root user-hostname pairs are assigned the role of Monitor.
NOTE: If you are using private IP addresses for communication within the cluster, and these
addresses are not known to DNS (or the name resolution service you use) these addresses must
be listed in /etc/hosts.
For requirements and restrictions that apply to IPv6only clusters and mixed-mode clusters, see
Rules and Restrictions for IPv6-Only Mode (page 89) and Rules and Restrictions for Mixed
Mode (page 90), respectively, and the latest version of the Serviceguard release notes.
For example, consider a two node cluster (gryf and sly) with two private subnets and a public
subnet. These nodes will be granting access by a non-cluster node (bit) which does not share the
private subnets. The /etc/hosts file on both cluster nodes should contain:
15.145.162.131 gryf.uksr.hp.com gryf
10.8.0.131 gryf.uksr.hp.com gryf
10.8.1.131 gryf.uksr.hp.com gryf
NOTE: Serviceguard recognizes only the hostname (the first element) in a fully qualified domain
name (a name like those in the example above). This means, for example, that gryf.uksr.hp.com
and gryf.cup.hp.com cannot be nodes in the same cluster, as Serviceguard would see them
as the same host gryf.
If applications require the use of hostname aliases, the Serviceguard hostname must be one of the
aliases in all the entries for that host. For example, if the two-node cluster in the previous example
were configured to use the alias hostnames alias-node1 and alias-node2, then the entries
in /etc/hosts should look something like this:
15.145.162.131 gryf.uksr.hp.com gryf1 alias-node1
10.8.0.131 gryf.uksr.hp.com gryf2 alias-node1
10.8.1.131 gryf.uksr.hp.com gryf3 alias-node1
15.145.162.132 sly.uksr.hp.com sly1 alias-node2
10.8.0.132 sly.uksr.hp.com sly2 alias-node2
10.8.1.132 sly.uksr.hp.com sly3 alias-node2
NOTE: If such a hang or error occurs, Serviceguard and all protected applications will continue
working even though the command you issued does not. That is, only the Serviceguard configuration
commands (and corresponding Serviceguard Manager functions) are affected, not the cluster
daemon or package services.
The procedure that follows shows how to create a robust name-resolution configuration that will
allow cluster nodes to continue communicating with one another if a name service fails.
1. Edit the /etc/hosts file on all nodes in the cluster. Add name resolution for all heartbeat
IP addresses, and other IP addresses from all the cluster nodes; see Configuring Name
Resolution (page 137) for discussion and examples.
NOTE: For each cluster node, the public-network IP address must be the first address listed.
This enables other applications to talk to other nodes on public networks.
2. If you are using DNS, make sure your name servers are configured in /etc/resolv.conf,
for example:
domain cup.hp.com
search cup.hp.com hp.com
nameserver 15.243.128.51
nameserver 15.243.160.51
3. Edit or create the /etc/nsswitch.conf file on all nodes and add the following text, if it
does not already exist:
for DNS, enter (two lines) :
hosts: files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return UNAVAIL=return]
If a line beginning with the string hosts: already exists, then make sure that the text
immediately to the right of this string is (on one line):
files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return UNAVAIL=return]
or
files [NOTFOUND=continue UNAVAIL=continue] nis [NOTFOUND=return UNAVAIL=return]
This step is critical, allowing the cluster nodes to resolve hostnames to IP addresses while DNS,
NIS, or the primary LAN is down.
4. Create a $SGCONF/cmclnodelist file on all nodes that you intend to configure into the
cluster, and allow access by all cluster nodes. See Allowing Root Access to an Unconfigured
Node (page 136).
NOTE: HP recommends that you do the bonding configuration from the system console, because
you will need to restart networking from the console when the configuration is done.
NOTE: During configuration, you need to make sure that the active slaves for the same bond on
each node are connected the same hub or switch. You can check on this by examining the file
/proc/net/bonding/bond<x>/info on each node. This file will show the active slave for
bond x.
NOTE: It is better not to restart the network from outside the cluster subnet, as there is a chance
the network could go down before the command can complete.
The command prints bringing up network statements.
If there was an error in any of the bonding configuration files, the network might not function
properly. If this occurs, check each configuration file for errors, then try to restart the network again.
NOTE: Do not change the UNIQUE and _nm_name parameters. You can leave MTU and
REMOTE_IPADDR in the file as long as they are not set.
Next, in /etc/sysconfig/network, edit your ifcfg-bond0 file so it looks like this:
BROADCAST='172.16.0.255'
BOOTPROTO='static'
IPADDR='172.16.0.1'
MTU=''
NETMASK='255.255.255.0'
NETWORK='172.16.0.0'
NOTE: Use ifconfig to find the relationship between eth IDs and the MAC addresses.
For more networking information on bonding, see
/usr/src/linux<kernel_version>/Documentation/networking/bonding.txt.
NOTE: It is better not to restart the network from outside the cluster subnet, as there is a chance
the network could go down before the command can complete.
If there is an error in any of the bonding configuration files, the network may not function properly.
If this occurs, check each configuration file for errors, then try to start the network again.
NOTE: An iSCSI storage device does not support configuring a lock LUN.
The following example of the fdisk dialog shows that the disk on the device file /dev/sdc is
set to Smart Array type partition, and appears as follows:
fdisk /dev/sdc
Command (m for help): n
Partition number (1-4): 1
HEX code (type L to list codes): 83
Command (m for help): 1
Command (m for help): 1
To transfer the disk partition format to other nodes in the cluster use the command:
sfdisk -R <device>
where <device> corresponds to the same physical device as on the first node. For example, if
/dev/sdc is the device name on the other nodes use the command:
sfdisk -R /dev/sdc
You can check the partition table by using the command:
fdisk -l /dev/sdc
NOTE: fdisk may not be available for SUSE on all platforms. In this case, using YAST2 to set
up the partitions is acceptable.
NOTE: On SUSE Linux Enterprise Server, the patches are not required as this feature is supported
on Serviceguard Linux Version A.11.20.10 main release.
If udev device is selected as lock This is supported, but the same udev rules must be used across all nodes in the
LUN. cluster for the whole LUN or the partitioned LUN.
If /dev/dm-xx is selected as lock This is not supported on a whole LUN or a partitioned LUN.
LUN.
CAUTION: The minor numbers used by the LVM volume groups must be the same on all cluster
nodes. This means that if there are any non-shared volume groups in the cluster, create the same
number of them on all nodes, and create them before you define the shared storage. If possible,
avoid using private volume groups, especially LVM boot volumes. Minor numbers increment with
each logical volume, and mismatched numbers of logical volumes between nodes can cause a
failure of LVM (and boot, if you are using an LVM boot volume).
NOTE: Except as noted in the sections that follow, you perform the LVM configuration of shared
storage on only one node. The disk partitions will be visible on other nodes as soon as you reboot
those nodes. After youve distributed the LVM configuration to all the cluster nodes, you will be
able to use LVM commands to switch volume groups between nodes. (To avoid data corruption,
a given volume group must be active on only one node at a time).
For multipath information, see Multipath for Storage (page 82).
NOTE: fdisk may not be available for SUSE on all platforms. In this case, using YAST2 to set
up the partitions is acceptable.
4. First cylinder (1-nn, default 1): Enter Accept the default starting cylinder 1
5. Last cylinder or +size or +sizeM or Enter Accept the default, which is the last
+sizeK (1-nn, default nn): cylinder number
The following example of the fdisk dialog shows that the disk on the device file /dev/sdc
is configured as one partition, and appears as follows:
fdisk /dev/sdc
Command (m for help): n
Command action
e extended
p primary partition (1-4) p
Partition number (1-4): 1
First cylinder (1-4067, default 1): Enter
Using default value 1Last cylinder or +size or +sizeM or +sizeK (1-4067, default 4067): Enter
Using default value 4067
2. Respond to the prompts as shown in the following table to set a partition type:
The following example of the fdisk dialog describes that the disk on the device file /dev/
sdc is set to Smart Array type partition, and appears as follows:
fdisk /dev/sdc
Command (m for help): t
Partition number (1-4): 1
HEX code (type L to list codes): 8e
NOTE: fdisk may not be available for SUSE on all platforms. In this case, using YAST2 to set
up the partitions is acceptable.
NOTE: At this point, the setup for volume-group activation protection is complete. Serviceguard
adds a tag matching the uname -n value of the owning node to each volume group defined
for a package when the package runs and deletes the tag when the package halts. The
command vgs -o +tags vgname will display any tags that are set for a volume group.
The sections that follow take you through the process of configuring volume groups and logical
volumes, and distributing the shared configuration. When you have finished that process, use
the procedure under Testing the Shared Configuration (page 151) to verify that the setup has
been done correctly.
5.1.12.4 Building Volume Groups: Example for Smart Array Cluster Storage (MSA 2000 Series)
NOTE: For information about setting up and configuring the MSA 2000 for use with Serviceguard,
see HP Serviceguard for Linux Version A.11.19 or later Deployment Guide at http://www.hp.com/
go/linux-serviceguard-docs.
Use Logical Volume Manager (LVM) on your system to create volume groups that can be activated
by Serviceguard packages. This section provides an example of creating Volume Groups on LUNs
created on MSA 2000 Series storage. For more information on LVM, see the Logical Volume
Manager How To, which you can find at http://tldp.org/HOWTO/HOWTO-INDEX/howtos.html.
Before you start, partition your LUNs and label them with a partition type of 8e (Linux LVM). Use
the type t parameter of the fdisk command to change from the default of 83 (Linux).
Do the following on one node:
1. Update the LVM configuration and create the /etc/lvmtab file. You can omit this step if
you have previously created volume groups on this node.
vgscan
NOTE: The files /etc/lvmtab and /etc/lvmtab.d may not exist on some distributions.
In that case, ignore references to these files.
NOTE: Use vgchange --addtag only if you are implementing volume-group activation
protection. Remember that volume-group activation protection, if used, must be implemented
on each node.
NOTE: For information about supported filesystem types, see the fs_type discussion on
(page 188).
5. To test that the file system /extra was created correctly and with high availability, you can
create a file on it, and read it.
echo "Test of LVM" >> /extra/LVM-test.conf
cat /extra/LVM-test.conf
NOTE: Be careful if you use YAST or YAST2 to configure volume groups, as that may cause
all volume groups on that system to be activated. After running YAST or YAST2, check to
make sure that volume groups for Serviceguard packages not currently running have not been
activated, and use LVM commands to deactivate any that have. For example, use the command
vgchange -a n /dev/sgvg00 to deactivate the volume group sgvg00.
NOTE: The minor numbers used by the LVM volume groups must be the same on all cluster nodes.
They will if all the nodes have the same number of unshared volume groups.
To distribute the shared configuration, follow these steps:
1. Unmount and deactivate the volume group, and remove the tag if necessary. For example, to
deactivate only vgpkgA:
umount /extra
vgchange -a n vgpkgA
vgchange --deltag $(uname -n) vgpkgA
2. To get the node ftsys10 to see the new disk partitioning that was done on ftsys9, reboot:
reboot
The partition table on the rebooted node is then rebuilt using the information placed on the
disks when they were partitioned on the other node.
3. Run vgscan to make the LVM configuration visible on the new node and to create the LVM
database on/etc/lvmtab and /etc/lvmtab.d. For example, on ftsys10:
vgscan
NOTE: If you are using the volume-group activation protection feature of Serviceguard for
Linux, you must use vgchange --addtag to add a tag when you manually activate a volume
group. Similarly, you must remove the tag when you deactivate a volume group that will be
used in a package (as shown at the end of each step).
Use vgchange --addtag and vgchange --deltag only if you are implementing
volume-group activation protection. Remember that volume-group activation protection, if used,
must be implemented on each node.
Serviceguard adds a tag matching the uname -n value of the owning node to each volume
group defined for a package when the package runs; the tag is deleted when the package
is halted. The command vgs -o +tags vgname will display any tags that are set for a
volume group.
vgchange -a y vgpkgB
mount /dev/vgpkgB/lvol1 /extra
echo Written by hostname on date > /extra/datestamp
cat /extra/datestamp
You should see something like the following, showing the date stamp written by the other
node:
Written by ftsys9.mydomain on Mon Jan 22 14:23:44 PST 2006
Now unmount the volume group again:
umount /extra
vgchange -a n vgpkgB
vgchange --deltag $(uname -n) vgpkgB
NOTE: The volume activation protection feature of Serviceguard for Linux requires that you
add the tag as shown at the beginning of the above steps when you manually activate a
volume group. Similarly, you must remove the tag when you deactivate a volume group that
will be used in a package (as shown at the end of each step). As of Serviceguard for Linux
A.11.16.07, a tag matching the uname -n value of the owning node is automatically added
to each volume group defined for a package when the package runs; the tag is deleted when
the package is halted. The command vgs -o +tags vgname will display any tags that are
set for a volume group.
5.1.12.8.1 Preventing Boot-Time vgscan and Ensuring Serviceguard Volume Groups Are
Deactivated
By default, Linux will perform LVM startup actions whenever the system is rebooted. These include
a vgscan (on some Linux distributions) and volume group activation. This can cause problems for
volumes used in a Serviceguard environment (for example, a volume group for a Serviceguard
package that is not currently running may be activated). To prevent such problems, proceed as
follows on the various Linux versions.
NOTE: You do not need to perform these actions if you have implemented volume-group activation
protection as described under Enabling Volume Group Activation Protection (page 148).
SUSE Linux Enterprise Server
Prevent a vgscan at boot time by removing the /etc/rc.d/boot.d/S07boot.lvm file from
all cluster nodes.
IMPORTANT: See NODE_NAME under Cluster Configuration Parameters (page 90) for important
information about restrictions on the node name.
Here is an example of the command (enter it all one line):
cmquerycl -v -C $SGCONF/clust1.conf -n ftsys9 -n ftsys10
This creates a template file, by default /usr/local/cmcluster/clust1.conf (for Red Hat
Enterprise Linux) and/opt/cmcluster/clust1.conf (for SUSE Linux Enterprise Server). In this
output file, keywords are separated from definitions by white space. Comments are permitted, and
must be preceded by a pound sign (#) in the far left column.
NOTE: HP strongly recommends that you modify the file so as to send heartbeat over all possible
networks.
The manpage for the cmquerycl command further explains the parameters that appear in this
file. Many are also described in Chapter 4: Planning and Documenting an HA Cluster (page 79).
Modify your /etc/cmcluster/clust1.configfile as needed.
IMPORTANT: See About Hostname Address Families: IPv4-Only, IPv6-Only, and Mixed Mode
(page 88) for a full discussion, including important restrictions for IPv6only and mixed modes.
If you use the -a option, Serviceguard will ignore the value of the HOSTNAME_ADDRESS_FAMILY
parameter in the existing cluster configuration, if any, and attempt to resolve the cluster and Quorum
Server hostnames as specified by the -a option:
If you specify -a ipv4, each of the hostnames must resolve to at least one IPv4 address;
otherwise the command will fail.
Similarly, if you specify -a ipv6, each of the hostnames must resolve to at least one IPv6
address; otherwise the command will fail.
If you specify -a any, Serviceguard will attempt to resolve each hostname to an IPv4 address,
then, if that fails, to an IPv6 address.
If you do not use the -a option:
If a cluster is already configured, Serviceguard will use the value configured for
HOSTNAME_ADDRESS_FAMILY, which defaults to IPv4.
If no cluster configured, and Serviceguard finds at least one IPv4 address that corresponds to
the local node's hostname (that is, the node on which you are running cmquerycl),
Serviceguard will attempt to resolve all hostnames to IPv4 addresses. If no IPv4 address is
found for a given hostname, Serviceguard will look for an IPv6 address. (This is the same
behavior as if you had specified -a any).
NOTE: This option must be used to discover actual or potential nodes and subnets in a cross-subnet
configuration. See Obtaining Cross-Subnet Information (page 156). It will also validate IP Monitor
polling targets; see Monitoring LAN Interfaces and Detecting Failure: IP Level (page 66), and
POLLING_TARGET under Cluster Configuration Parameters (page 90).
NOTE: An iSCSI storage device does not support configuring a lock LUN.
5 lan1 (nodeC)
lan1 (nodeD)
6 lan2 (nodeC)
lan2 (nodeD)
IP subnets:
IPv4:
IPv6:
1 15.13.164.0
15.13.172.0
2 15.13.165.0
15.13.182.0
3 15.244.65.0
4 15.244.56.0
In the Route connectivity section, the numbers on the left (1-4) identify which subnets are
routed to each other (for example, 15.13.164.0 and 15.13.172.0).
IMPORTANT: Note that in this example subnet 15.244.65.0, used by NodeA and NodeB, is
not routed to 15.244.56.0, used by NodeC and NodeD.
But subnets 15.13.164.0 and 15.13.165.0, used by NodeA and NodeB, are routed respectively
to subnets 15.13.172.0 and 15.13.182.0, used by NodeC and NodeD. At least one such
routing among all the nodes must exist for cmquerycl to succeed.
For information about configuring the heartbeat in a cross-subnet configuration, see the
HEARTBEAT_IP parameter discussion under Cluster Configuration Parameters (page 90).
NOTE: Remember to tune kernel parameters on each node to ensure that they are set high enough
for the largest number of packages that will ever run concurrently on that node.
IMPORTANT: A remote user (one who is not logged in to a node in the cluster, and is not
connecting via rsh or ssh) can have only Monitor access to the cluster.
(Full Admin and Package Admin can be configured for such a user, but this usage is
deprecated. As of Serviceguard A.11.18 configuring Full Admin or Package Admin for remote
users gives them Monitor capabilities. See Setting up Access-Control Policies (page 160) for
more information.)
NOTE: For more information and advice, see the white paper Securing Serviceguard at http://
www.hp.com/go/hpux-serviceguard-docs (Select HP Serviceguard -> White Papers).
Define access-control policies for a cluster in the cluster configuration file; see Cluster Configuration
Parameters (page 90). To define access control for a specific package, use user_host (page 190)
and related parameters in the package configuration file. You can define up to 200 access policies
for each cluster. A root user can create or modify access control policies while the cluster is running.
NOTE: Once nodes are configured into a cluster, the access-control policies you set in the cluster
and package configuration files govern cluster-wide security; changes to the bootstrap
cmclnodelist file are ignored (see Allowing Root Access to an Unconfigured Node (page 136)).
NOTE: The commands must be issued on USER_HOST but can take effect on other nodes;
for example, patrick can use bits command line to start a package on gryf (assuming
bit and gryf are in the same cluster).
Choose one of these three values for USER_HOST:
A specific node name - Use the hostname portion (the first part) of a fully-qualified domain
name that can be resolved by the name service you are using; it should also be in each
nodes /etc/hosts. Do not use an IP addresses or the fully-qualified domain name. If
there are multiple hostnames (aliases) for an IP address, one of those must match
USER_HOST. See Configuring Name Resolution (page 137) for more information.
USER_ROLE must be one of these three values:
MONITOR
FULL_ADMIN
PACKAGE_ADMIN
MONITOR and FULL_ADMIN can be set only in the cluster configuration file and they apply
to the entire cluster. PACKAGE_ADMIN can be set in the cluster configuration file or a package
configuration file. If it is set in the cluster configuration file, PACKAGE_ADMIN applies to all
configured packages; if it is set in a package configuration file, it applies to that package
only. These roles are not exclusive; for example, more than one user can have the
PACKAGE_ADMIN role for the same package.
NOTE: You do not have to halt the cluster or package to configure or modify access control
policies.
Here is an example of an access control policy:
USER_NAME john
IMPORTANT: Wildcards do not degrade higher-level roles that have been granted to individual
members of the class specified by the wildcard. For example, you might set up the following policy
to allow root users on remote systems access to the cluster:
USER_NAME root
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE MONITOR
This does not reduce the access level of users who are logged in as root on nodes in this cluster;
they will always have full Serviceguard root-access capabilities.
Consider what would happen if these entries were in the cluster configuration file:
# Policy 1:
USER_NAME john
USER_HOST bit
USER_ROLE PACKAGE_ADMIN
# Policy 2:
USER_NAME john
USER_HOST bit
USER_ROLE MONITOR
# Policy 3:
USER_NAME ANY_USER
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE MONITOR
In the above example, the configuration would fail because user john is assigned two roles. (In
any case, Policy 2 is unnecessary, because PACKAGE_ADMIN includes the role of MONITOR).
Policy 3 does not conflict with any other policies, even though the wildcard ANY_USER includes
the individual user john.
NOTE: Check spelling especially carefully when typing wildcards, such as ANY_USER and
ANY_SERVICEGUARD_NODE. If they are misspelled, Serviceguard will assume they are specific
users or nodes.
AUTOSTART_CMCLD=1
CAUTION: But you should not try to restart Serviceguard; data corruption might occur if another
node were to attempt to start up a new instance of an application that is still running on the single
node. Instead, choose an appropriate time to shut down and reboot the node. This will allow the
applications to shut down and Serviceguard to restart the cluster after the reboot.
CAUTION: This is not recommended. Consult the white paper Securing Serviceguard at http://
www.hp.com/go/hpux-serviceguard-docs (Select HP Serviceguard -> White Papers) for more
information.
NOTE: The cmdeleteconf command removes only the cluster binary file $SGCONF/
cmclconfig. It does not remove any other files from the $SGCONF directory.
Although the cluster must be halted, all nodes in the cluster should be powered up and accessible
before you use the cmdeleteconf command. If a node is powered down, power it up and allow
it to boot. If a node is inaccessible, you will see a list of inaccessible nodes and the following
message:
Checking current status
cmdeleteconf: Unable to reach node lptest1.
WARNING: Once the unreachable node is up, cmdeleteconf
should be executed on the node to remove the configuration.
Online OS upgrade between minor Yes You must manually rebuild the deadman driver as the
releases. For example, RHEL 6.1 to OS upgrade process would have updated the kernel.
RHEL 6.2
Fresh installation of the OS No Whenever you install the OS for the first time,
Serviceguard must be installed afresh. This rebuilds
the deadman driver.
Kernel errata only update Yes You must manually rebuild the deadman driver as the
kernel update process would have updated the kernel.
Kernel errata only update No The Serviceguard update (patch) install process
Serviceguard update (patch) rebuilds the deadman driver.
installation
Fresh installation of Serviceguard No The Serviceguard install process rebuilds the deadman
driver.
NOTE: This is a new process for configuring packages, as of Serviceguard A.11.18. This manual
refers to packages created by this method as modular packages, and assumes that you will use it
to create new packages. It is simpler and more efficient than the older method, allowing you to
build packages from smaller modules, and eliminating the separate package control script and
the need to distribute it manually.
Packages created using Serviceguard A.11.16 or earlier are referred to as legacy packages. If
you need to reconfigure a legacy package (rather than create a new package), see Configuring
a Legacy Package (page 233).
It is also still possible to create new legacy packages by the method described in Configuring a
Legacy Package. If you are using a Serviceguard Toolkit, consult the documentation for that
product.
If you decide to convert a legacy package to a modular package, see Migrating a Legacy Package
to a Modular Package (page 240).
(Parameters that are in the package control script for legacy packages, but in the package
configuration file instead for modular packages, are indicated by (S) in the tables under Optional
Package Modules (page 172)).
IMPORTANT: Multi-node packages must either use a clustered file system such as Red Hat
GFS (Red Hat GFS is not supported in Serviceguard A.11.20.00), or not use shared storage.
To generate a package configuration file that creates a multi-node package, include-m
sg/multi_node on the cmmakepkg command line. See Generating the Package
Configuration File (page 191).
System multi-node packages. System multi-node packages are supported only for applications
supplied by HP.
For more information about types of packages and how they work, see How the Package Manager
Works (page 43). For information on planning a package, see Package Configuration Planning
(page 104).
When you have decided on the type of package you want to create, the next step is to decide
what additional package-configuration modules you need to include; see Package Modules and
Parameters (page 171).
NOTE: If you are going to create a complex package that contains many modules, you may
want to skip the process of selecting modules, and simply create a configuration file that contains
all the modules:
cmmakepkg -m sg/all $SGCONF/pkg_sg_complex
(The output will be written to $SGCONF/pkg_sg_complex.)
*
pev pev_ (page 190) Add to a base module to configure
environment variables to be
passed to an external script.
*
external_pre external_pre_script (page 190) Add to a base module to specify
additional programs to be run
before volume groups are
activated while the package is
starting and after they are
deactivated while the package is
halting.
*
external external_script (page 190) Add to a base module to specify
additional programs to be run
during package start and halt time.
multi_node_all all parameters that can be used by a multi-node Use if you are creating a
package; includes multi_node, dependency, multi-node package that requires
monitor_subnet, service, volume_group, most or all of the optional
filesystem, pev, external_pre, external, parameters that are available for
and acp modules. this type of package.
NOTE: The default form for parameter names in the modular package configuration file is lower
case; for legacy packages the default is upper case. There are no compatibility issues; Serviceguard
is case-insensitive as far as the parameter names are concerned. This manual uses lower case,
unless the parameter in question is used only in legacy packages, or the context refers exclusively
to such a package.
6.1.4.1 package_name
Any name, up to a maximum of 39 characters, that:
starts and ends with an alphanumeric character
otherwise contains only alphanumeric characters or dot (.), dash (-), or underscore (_)
is unique among package names in this cluster
6.1.4.2 module_name
The module name. Do not change it. Used in the form of a relative path (for example, sg/
failover) as a parameter to cmmakepkg specify modules to be used in configuring the package.
(The files reside in the $SGCONF/modules directory; see Understanding the Location of
Serviceguard Files (page 135) for the location of $SGCONF on your version of Linux.)
New for modular packages.
6.1.4.3 module_version
The module version. Do not change it.
New for modular packages.
6.1.4.4 package_type
The type can be failover, multi_node, or system multi_node. You can configure only
failover or multi-node packages; see Types of Package: Failover, Multi-Node, System Multi-Node
(page 170).
Packages of one type cannot include the base module for another; for example, if package_type
is failover, the package cannot include the multi_node, or system_multi_node module.
6.1.4.5 package_description
The application that the package runs. This is a descriptive parameter that can be set to any value
you choose, up to a maximum of 80 characters. Default value is Serviceguard Package.
SITE_NAME A
NODE STATUS STATE
node1 up running
node2 up running
SITE_NAME B
NODE STATUS STATE
node3 up running
node4 up running
IMPORTANT: See Cluster Configuration Parameters (page 90) for important information about
node names.
See About Cross-Subnet Failover (page 130) for considerations when configuring cross-subnet
packages, which are further explained under Cross-Subnet Configurations (page 27).
6.1.4.7 auto_run
Can be set to yes or no. The default is yes.
For failover packages, yes allows Serviceguard to start the package (on the first available node
listed under node_name) on cluster start-up, and to automatically restart it on an adoptive node
if it fails. no prevents Serviceguard from automatically starting the package, and from restarting
it on another node.
This is also referred to as package switching, and can be enabled or disabled while the package
is running, by means of the cmmodpkg command.
auto_run should be set to yes if the package depends on another package, or is depended on;
see About Package Dependencies (page 113).
For system multi-node packages, auto_run must be set to yes. In the case of a multi-node package,
setting auto_run to yes allows an instance to start on a new node joining the cluster; no means
it will not.
6.1.4.8 node_fail_fast_enabled
Can be set to yes or no. The default is no.
NOTE: If the package halt function fails with exit 1, Serviceguard does not halt the node,
but sets no_restart for the package, which disables package switching, setting auto_run
(page 176) to no and thereby preventing the package from starting on any adoptive node.
Setting node_fail_fast_enabled to yes prevents Serviceguard from repeatedly trying (and
failing) to start the package on the same node.
Setting node_fail_fast_enabled to yes ensures that the package can fail over to another
node even if the package cannot halt successfully. Be careful when using
node_fail_fast_enabled, as it will cause all packages on the node to halt abruptly. For more
information, see Responses to Failures (page 75) and Responses to Package and Service
Failures (page 77).
For system multi-node packages, node_fail_fast_enabled must be set to yes.
6.1.4.9 run_script_timeout
The amount of time, in seconds, allowed for the package to start; or no_timeout. The default is
no_timeout. The maximum is 4294.
If the package does not complete its startup in the time specified by run_script_timeout,
Serviceguard will terminate it and prevent it from switching to another node. In this case, if
node_fail_fast_enabled is set to yes, the node will be halted (rebooted).
If no timeout is specified (no_timeout), Serviceguard will wait indefinitely for the package to
start.
If a timeout occurs:
Switching will be disabled.
The current node will be disabled from running the package.
NOTE: If no_timeout is specified, and the script hangs, or takes a very long time to complete,
during the validation step (cmcheckconf (1m)), cmcheckconf will wait 20 minutes to allow
the validation to complete before giving up.
6.1.4.10 halt_script_timeout
The amount of time, in seconds, allowed for the package to halt; or no_timeout. The default is
no_timeout. The maximum is 4294.
If the packages halt process does not complete in the time specified by halt_script_timeout,
Serviceguard will terminate the package and prevent it from switching to another node. In this
case, if node_fail_fast_enabled (page 176) is set to yes, the node will be halted (reboot).
If a halt_script_timeout is specified, it should be greater than the sum of all the values set
for service_halt_timeout (page 184) for this package.
If a timeout occurs:
6.1.4.11 successor_halt_timeout
Specifies how long, in seconds, Serviceguard will wait for packages that depend on this package
to halt, before halting this package. Can be 0 through 4294, or no_timeout. The default is
no_timeout.
no_timeout means that Serviceguard will wait indefinitely for the dependent packages to
halt.
0 means Serviceguard will not wait for the dependent packages to halt before halting this
package.
New as of A.11.18 (for both modular and legacy packages). See also About Package
Dependencies (page 113).
6.1.4.12 script_log_file
The full pathname of the packages log file. The default is$SGRUN/log/<package_name>.log.
(See Understanding the Location of Serviceguard Files (page 135) for more information about
Serviceguard pathnames.) See also log_level.
6.1.4.13 operation_sequence
Defines the order in which the scripts defined by the packages component modules will start up.
See the package configuration file for details.
This parameter is not configurable; do not change the entries in the configuration file.
New for modular packages.
6.1.4.14 log_level
Determines the amount of information printed to stdout when the package is validated, and to
the script_log_file when the package is started and halted. Valid values are 0 through 5,
but you should normally use only the first two (0 or 1); the remainder (2 through 5) are intended
for use by HP Support.
0 - informative messages
1 - informative messages with slightly more detail
2 - messages showing logic flow
3 - messages showing detailed data structure information
4 - detailed debugging information
5 - function call flow
New for modular packages.
6.1.4.15 failover_policy
Specifies how Serviceguard decides where to start the package, or restart it if it fails. Can be set
to configured_node, min_package_node, site_preferred, or
site_preferred_manual. The default is configured_node.
NOTE:
For site_preferred or site_preferred_manual failover_policy to be effective
define the policy in the packages running or configured to run on the cluster with more than
one site configured or more than one site nodes.
When site_preferred or site_preferred_manual failover_policy is defined
in a package, cmrunpkg -a option cannot be used to run the package.
This parameter can be set for failover packages only. If this package will depend on another
package or vice versa, see also About Package Dependencies (page 113).
6.1.4.16 failback_policy
Specifies whether or not Serviceguard will automatically move a package that is not running on
its primary node (the first node on its node_name list) when the primary node is once again
available. Can be set to automatic or manual. The default is manual.
manual means the package will continue to run on the current node.
automatic means Serviceguard will move the package to the primary node as soon as that
node becomes available, unless doing so would also force a package with a higher priority
to move.
CAUTION: When the failback_policy is automatic and you set the NODE_NAME to '*',
if you add, delete, or rename a node in the cluster, the primary node for the package might change
resulting in the automatic failover of that package.
6.1.4.17 priority
Assigns a priority to a failover package whose failover_policy is configured_node. Valid
values are 1 through 3000, or no_priority. The default is no_priority. See also the
dependency_ parameter descriptions (page 180).
priority can be used to satisfy dependencies when a package starts, or needs to fail over or
fail back: a package with a higher priority than the packages it depends on can force those
packages to start or restart on the node it chooses, so that its dependencies are met.
IMPORTANT: Because priority is a matter of ranking, a lower number indicates a higher priority
(20 is a higher priority than 40). A numerical priority is higher than no_priority.
New as of A.11.18 (for both modular and legacy packages). See About Package Dependencies
(page 113) for more information.
6.1.4.18 dependency_name
A unique identifier for a particular dependency (see dependency_condition) that must be met
in order for this package to run (or keep running). It must be unique among this package's
dependency_names. The length and formal restrictions for the name are the same as for
package_name (page 175).
6.1.4.19 dependency_condition
The condition that must be met for this dependency to be satisfied. As of Serviceguard A.11.18,
the only condition that can be set is that another package must be running.
The syntax is: <package_name> = UP, where <package_name> is the name of the package
depended on. The type and characteristics of the current package (the one we are configuring)
impose the following restrictions on the type of package it can depend on:
If the current package is a multi-node package, <package_name> must identify a multi-node
or system multi-node package.
If the current package is a failover package and its failover_policy (page 178) is
min_package_node, <package_name> must identify a multi-node or system multi-node
package.
If the current package is a failover package and configured_node is its
failover_policy, <package_name> must identify a multi-node or system multi-node
package, or a failover package whose failover_policy is configured_node.
See also About Package Dependencies (page 113).
6.1.4.20 dependency_location
Specifies where the dependency_condition must be met. The only legal value is same_node.
NOTE: But if weight_name is package_limit, you can use only that one weight and capacity
throughout the cluster. package_limit is a reserved value, which, if used, must be entered
exactly in that form. It provides the simplest way of managing weights and capacities; see Simple
Method (page 121) for more information.
The rules for forming weight_name are the same as those for forming package_name (page 175).
weight_name must exactly match the corresponding CAPACITY_NAME.
weight_value is an unsigned floating-point value between 0 and 1000000 with at most three
digits after the decimal point.
You can use these parameters to override the cluster-wide default package weight that corresponds
to a given node capacity. You can define that cluster-wide default package weight by means of
the WEIGHT_NAME and WEIGHT_DEFAULT parameters in the cluster configuration file (explicit
default). If you do not define an explicit default (that is, if you define a CAPACITY_NAME in the
cluster configuration file with no corresponding WEIGHT_NAME and WEIGHT_DEFAULT), the
default weight is assumed to be zero (implicit default). Configuring weight_name and
weight_value here in the package configuration file overrides the cluster-wide default (implicit
or explicit), and assigns a particular weight to this package.
For more information, see About Package Weights (page 120). See also the discussion of the
relevant parameters under Cluster Configuration Parameters (page 90), in the cmmakepkg
(1m) and cmquerycl (1m) manpages, and in the cluster configuration and package
configuration template files.
6.1.4.22 monitored_subnet
The LAN subnet that is to be monitored for this package. Replaces legacy SUBNET which is still
supported in the package configuration file for legacy packages; see Configuring a Legacy
Package (page 233).
You can specify multiple subnets; use a separate line for each.
If you specify a subnet as a monitored_subnet the package will not run on any node not
reachable via that subnet. This normally means that if the subnet is not up, the package will not
run. (For cross-subnet configurations, in which a subnet may be configured on some nodes and
not on others, see monitored_subnet_access below, ip_subnet_node (page 183), and
About Cross-Subnet Failover (page 130).)
Typically you would monitor the ip_subnet, specifying it here as well as in the ip_subnet
parameter (page 182), but you may want to monitor other subnets as well; you can specify any
subnet that is configured into the cluster (via the STATIONARY_IP parameter in the cluster
configuration file). See Stationary and Relocatable IP Addresses and Monitored Subnets (page 62)
for more information.
If any monitored_subnet fails, Serviceguard will switch the package to any other node specified
by node_name (page 176) which can communicate on all the monitored_subnets defined for
this package. See the comments in the configuration file for more information and examples.
6.1.4.24 ip_subnet
Specifies an IP subnet used by the package. Replaces SUBNET, which is still supported in the
package control script for legacy packages.
CAUTION: HP recommends that this subnet be configured into the cluster. You do this in the
cluster configuration file by specifying a HEARTBEAT_IP or STATIONARY_IP under a
NETWORK_INTERFACE on the same subnet, for each node in this package's NODE_NAME list. For
example, an entry such as the following in the cluster configuration file configures subnet
192.10.25.0 (lan1) on node ftsys9:
NODE_NAME ftsys9
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.10.25.18
SeeCluster Configuration Parameters (page 90) for more information.
If the subnet is not configured into the cluster, Serviceguard cannot manage or monitor it, and in
fact cannot guarantee that it is available on all nodes in the package's node-name list (page 176)
. Such a subnet is referred to as an external subnet, and relocatable addresses on that subnet are
known as external addresses. If you use an external subnet, you risk the following consequences:
If the subnet fails, the package will not fail over to an alternate node.
Even if the subnet remains intact, if the package needs to fail over because of some other type
of failure, it could fail to start on an adoptive node because the subnet is not available on that
node.
For each subnet used, specify the subnet address on one line and, on the following lines, the
relocatable IP addresses that the package uses on that subnet. These will be configured when the
package starts and unconfigured when it halts.
For example, if this package uses subnet 192.10.25.0 and the relocatable IP addresses
192.10.25.12 and 192.10.25.13, enter:
ip_subnet 192.10.25.0
ip_address 192.10.25.12
ip_address 192.10.25.13
If you want the subnet to be monitored, specify it in the monitored_subnet parameter (page 181)
as well.
In a cross-subnet configuration, you also need to specify which nodes the subnet is configured on;
see ip_subnet_node below. See also monitored_subnet_access (page 182) and About
Cross-Subnet Failover (page 130).
This parameter can be set for failover packages only.
6.1.4.26 ip_address
A relocatable IP address on a specified ip_subnet. Replaces IP, which is still supported in the
package control script for legacy packages.
For more information about relocatable IP addresses, see Stationary and Relocatable IP Addresses
and Monitored Subnets (page 62).
This parameter can be set for failover packages only.
6.1.4.27 service_name
A service is a program or function which Serviceguard monitors as long the package is up.
service_name identifies this function and is used by the cmrunserv and cmhaltserv
commands. You can configure a maximum of 30 services per package and 900 services per
cluster.
The length and formal restrictions for the name are the same as for package_name (page 175).
service_name must be unique among all packages in the cluster.
IMPORTANT: Restrictions on service names in previous Serviceguard releases were less stringent.
Packages that specify services whose names do not conform to the above rules will continue to
run, but if you reconfigure them, you will need to change the name; cmcheckconf and
cmapplyconf will enforce the new rules.
Each service is defined by five parameters: service_name, service_cmd, service_restart,
service_fail_fast_enabled, and service_halt_timeout. See the descriptions that
follow.
The following is an example of fully defined service:
service_name patricks-package4-ping]
service_cmd "/usr/sbin/ping hasupt22"
service_restart unlimited
service_fail_fast_enabled no
service_halt_timeout 300
See the package configuration template file for more examples.
For legacy packages, this parameter is in the package control script as well as the package
configuration file.
6.1.4.28 service_cmd
The command that runs the program or function for this service_name, for example,
/usr/bin/X11/xclock -display 15.244.58.208:0
An absolute pathname is required; neither the PATH variable nor any other environment variable
is passed to the command. The default shell is /bin/sh.
6.1.4.29 service_restart
The number of times Serviceguard will attempt to re-run the service_cmd. Valid values are
unlimited, none or any positive integer value. Default is none.
If the value is unlimited, the service will be restarted an infinite number of times. If the value is
none, the service will not be restarted.
This parameter is in the package control script for legacy packages.
6.1.4.30 service_fail_fast_enabled
Specifies whether or not Serviceguard will halt the node (reboot) on which the package is running
if the service identified by service_name fails. Valid values are yes and no. Default is no,
meaning that failure of this service will not cause the node to halt.
6.1.4.31 service_halt_timeout
The length of time, in seconds, Serviceguard will wait for the service to halt before forcing
termination of the services process. The maximum value is 4294.
The value should be large enough to allow any cleanup required by the service to complete.
If no value is specified, a zero timeout will be assumed, meaning that Serviceguard will not wait
any time before terminating the process.
6.1.4.32 generic_resource_name
Defines the logical name used to identify a generic resource in a package. This name corresponds
to the generic resource name used by the cmgetresource(1m) and cmsetresource(1m)
commands.
Multiple generic_resource_name entries can be specified in a package.
The length and formal restrictions for the name are the same as for package_name (page 175).
Each name must be unique within a package, but a single resource can be specified across multiple
packages.
You can configure a maximum of 100 generic resources per cluster.
Each generic resource is defined by three parameters:
6.1.4.33 generic_resource_evaluation_type
Defines when the status of a generic resource is evaluated.
Valid values are during_package_start and before_package_start. The default is
during_package_start.
The resources that will be available during the course of start of the package must be configured
with an evaluation_type as during_package_start.
Monitoring for these generic resources can be started and stopped as a part of the package, and
the monitoring script can be configured as a service. This can be achieved by configuring a
service_name and a service_cmd containing the full path name of the monitoring
executable/script. The monitoring of the generic resource starts only when the monitoring scripts
are started and not at the start of the package.
For information on monitoring scripts, see Monitoring Script for Generic Resources (page 303).
If there is a common generic resource that needs to be monitored as a part of multiple packages,
then the monitoring script for that resource can be launched as part of one package and all other
packages can use the same monitoring script. There is no need to launch multiple monitors for a
common resource. If the package that has started the monitoring script fails or is halted, then all
the other packages that are using this common resource also fail.
These resources will usually be of the evaluation_type before_package_start and it is
recommended to configure the monitoring script in a multi-node package.
These resources must be available (status must be 'up') in order to start the package and the
monitoring scripts for these resources must be configured outside of the application package.
6.1.4.34 generic_resource_up_criteria
Defines a criterion to determine whether the status of a generic resource identified by
generic_resource_name is up.
Attribute requires a logical operator and a value. The operators ==, !=, >, <, >=, and <= are
allowed. Values must be positive integer values ranging from 1 to 2147483647.
6.1.4.35 vgchange_cmd
Replaces VGCHANGE, which is still supported for legacy packages; see Configuring a Legacy
Package (page 233). Specifies the method of activation for each Logical Volume Manager (LVM)
volume group identified by a vg entry.
The default is vgchange -a y.
6.1.4.36 vg
Specifies an LVM volume group (one per vg, each on a new line) on which a file system (other
than Red Hat GFS; see fs_type) needs to be mounted. A corresponding vgchange_cmd (see
above) specifies how the volume group is to be activated. The package script generates the
necessary filesystem commands on the basis of the fs_ parameters (see File system parameters
).
6.1.4.38 concurrent_fsck_operations
The number of concurrent fsck operations allowed on file systems being mounted during package
startup. Not used for Red Hat GFS (see fs_type).
Legal value is any number greater than zero. The default is 1.
If the package needs to run fsck on a large number of file systems, you can improve performance
by carefully tuning this parameter during testing (increase it a little at time and monitor performance
each time).
6.1.4.39 fs_mount_retry_count
The number of mount retries for each file system. Legal value is zero or any greater number. The
default is zero. The only valid value for Red Hat GFS (see fs_type) is zero. Red Hat GFS is not
supported in Serviceguard A.11.20.00.
If the mount point is busy at package startup and fs_mount_retry_count is set to zero, package
startup will fail.
If the mount point is busy and fs_mount_retry_count is greater than zero, the startup script
will attempt to kill the user process responsible for the busy mount point (fuser -ku) and then
try to mount the file system again. It will do this the number of times specified by
fs_mount_retry_count.
If the mount still fails after the number of attempts specified by fs_mount_retry_count, package
startup will fail.
This parameter is in the package control script for legacy packages.
6.1.4.40 fs_umount_retry_count
The number of umount retries for each file system. Replaces FS_UMOUNT_COUNT, which is still
supported in the package control script for legacy packages; see Configuring a Legacy Package
(page 233).
Legal value is 1 or (for filesystem types other than Red Hat GFS) any greater number. The default
is 1. Operates in the same way as fs_mount_retry_count.
6.1.4.41 fs_name
This parameter, in conjunction with fs_directory, fs_type, fs_mount_opt,
fs_umount_opt, and fs_fsck_opt, specifies a filesystem that is to be mounted by the package.
Replaces LV, which is still supported in the package control script for legacy packages.
fs_name must specify the block devicefile for a logical volume.
CAUTION: Before configuring an NFS-imported file system into a package, make sure you have
read and understood the rules and guidelines under Planning for NFS-mounted File Systems
(page 106), and configured the cluster parameter CONFIGURED_IO_TIMEOUT_EXTENSION,
described under Cluster Configuration Parameters (page 90).
File systems are mounted in the order you specify in the package configuration file, and unmounted
in the reverse order.
See File system parameters (page 186) and the comments in the FILESYSTEMS section of the
configuration file for more information and examples. See also Volume Manager Planning
(page 85), and the mount manpage.
NOTE: For filesystem types other than Red Hat GFS (see fs_type), a volume group must be
defined in this file (using vg; see (page 186)) for each logical volume specified by an fs_name
entry.
6.1.4.42 fs_server
The name or IP address (IPv4 or IPv6) of the NFS server for an NFS-imported file system. In this
case, you must also set fs_type to nfs, fs_mount_opt to -o llock on HPUX , and -o
local_lock = all on Linux. fs_name specifies the directory to be imported from fs_server,
and fs_directory specifies the local mount point.
For example:
fs_name /var/opt/nfs/share1
fs_server wagon
fs_directory /nfs/mnt/share1
fs_type nfs
#fs_mount_opt o local_lock =all
#fs_umount_opt
#fs_fsck_opt
NOTE: fs_umount_opt is optional and fs_fsck_opt is not used for an NFS-imported file
system. (Both are left commented out in this example.)
6.1.4.43 fs_directory
The root of the file system specified by fs_name. Replaces FS, which is still supported in the
package control script for legacy packages; see Configuring a Legacy Package (page 233).
See the mount manpage and the comments in the configuration file for more information.
6.1.4.44 fs_type
The type of the file system specified by fs_name. This parameter is in the package control script
for legacy packages.
For an NFS-imported file system, this must be set to nfs. See the example under fs_server
(page 188).
Table 11 lists the supported file system types and platforms.
Supported types are ext3, XFS file system (on Red Hat Enterprise Linux 6 and
later, and SUSE Linux Enterprise Server 11), ext4 (on Red Hat Enterprise
Linux 5 and later), reiserfs, and gfs.
Red Hat GFS and reiserfs are not supported in Serviceguard A.11.20.00 version.
WARNING! ext4 file system has a delayed allocation mechanism. Hence, the behavior of
writing files to disk is different from ext3. Unlike ext3, the ext4 file system does not write data
to disk on committing the transaction, so it takes longer for the data to be written to the disk. Your
program must use data integrity calls such as fsync() to ensure that data is written to the disk.
NOTE: A package using gfs (Red Hat Global File System, or GFS) cannot use any other file
systems of a different type. vg and vgchange_cmd (page 186) are not valid for GFS file systems.
For more information about using GFS with Serviceguard, see Clustering Linux Servers with the
Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File Systems for RHEL5
at http://www.hp.com/go/linux-serviceguard-docs.
See also concurrent_fsck_operations (page 187), fs_mount_retry_count and
fs_umount_retry_count (page 187), and fs_fsck_opt (page 189).
See the comments in the package configuration file template for more information.
6.1.4.45 fs_mount_opt
The mount options for the file system specified by fs_name. See the comments in the configuration
file for more information. This parameter is in the package control script for legacy packages.
6.1.4.46 fs_umount_opt
The umount options for the file system specified by fs_name. See the comments in the configuration
file for more information. This parameter is in the package control script for legacy packages.
6.1.4.47 fs_fsck_opt
The fsck options for the file system specified by fs_name. Not used for Red Hat GFS (Red Hat
GFS is not supported in Serviceguard A.11.20.00) (see fs_type). This parameter is in the package
control script for legacy packages.
NOTE: A package using XFS file system must use xfs_repair command options as fsck
command on XFS does not work.
For more information, see the fsck and xfs_repair manpage, and the comments in the
configuration file.
IMPORTANT: This parameter is for use only by HP partners, who should follow the instructions
in the package configuration file.
For information about Serviceguard's implementation of PR, see About Persistent Reservations
(page 72).
6.1.4.49 pev_
Specifies a package environment variable that can be passed to external_pre_script,
external_script, or both, by means of the cmgetpkgenv command. New for modular
packages.
The variable name must be in the form pev_<variable_name> and contain only alphanumeric
characters and underscores. The letters pev (upper-case or lower-case) followed by the underscore
(_) are required.
The variable name and value can each consist of a maximum of MAXPATHLEN characters (4096
on Linux systems).
You can define more than one variable. See About External Scripts (page 127), as well as the
comments in the configuration file, for more information.
6.1.4.50 external_pre_script
The full pathname of an external script to be executed before volume groups and disk groups are
activated during package startup, and after they have been deactivated during package shutdown;
that is, effectively the first step in package startup and last step in package shutdown. New for
modular packages.
If more than one external_pre_script is specified, the scripts will be executed on package
startup in the order they are entered into the package configuration file, and in the reverse order
during package shutdown.
See About External Scripts (page 127), as well as the comments in the configuration file, for more
information and examples.
6.1.4.51 external_script
The full pathname of an external script. This script is often the means of launching and halting the
application that constitutes the main function of the package. New for modular packages.
The script is executed on package startup after volume groups and file systems are activated and
IP addresses are assigned, but before services are started; and during package shutdown after
services are halted but before IP addresses are removed and volume groups and file systems
deactivated.
If more than one external_script is specified, the scripts will be executed on package startup
in the order they are entered into this file, and in the reverse order during package shutdown.
See About External Scripts (page 127), as well as the comments in the configuration file, for more
information and examples. See also service_cmd (page 183).
6.1.4.52 user_host
The system from which a user specified by user_name (page 191) can execute
package-administration commands.
Legal values are any_serviceguard_node, or cluster_member_node, or a specific cluster
node. If you specify a specific node it must be the official hostname (the hostname portion, and
6.1.4.53 user_name
Specifies the name of a user who has permission to administer this package. See also user_host
(page 190) and user_role; these three parameters together define the access control policy for
this package (see Controlling Access to the Cluster (page 158)). These parameters must be defined
in this order: user_name, user_host, user_role.
Legal values for user_name are any_user or a maximum of eight login names from /etc/
passwd on user_host.
NOTE: Be careful to spell any_user exactly as given; otherwise Serviceguard will interpret it
as a user name.
Note that the only user_role that can be granted in the package configuration file is
package_admin for this particular package; you grant other roles in the cluster configuration
file. See Setting up Access-Control Policies (page 160) for further discussion and examples.
6.1.4.54 user_role
Must be package_admin, allowing the user access to the cmrunpkg, cmhaltpkg, and cmmodpkg
commands (and the equivalent functions in Serviceguard Manager) and to the monitor role for
the cluster. See Controlling Access to the Cluster (page 158) for more information.
IMPORTANT: The following parameters are used only by legacy packages. Do not try to use
them in modular packages. See Creating the Legacy Package Configuration (page 233) for more
information.
PATH Specifies the path to be used by the script.
SUBNET Specifies the IP subnets that are to be monitored for the
package.
RUN_SCRIPTand HALT_SCRIPT Use the full pathname of each script.
These two parameters allow you to separate package run
instructions and package halt instructions for legacy
packages into separate scripts if you need to. In this case,
make sure you include identical configuration information
(such as node names, IP addresses, etc.) in both scripts.
In most cases, though, HP recommends that you use the
same script for both run and halt instructions. (When the
package starts, the script is passed the parameter start;
when it halts, it is passed the parameter stop.)
LV The name of a logical volume hosting a file system that will
be mounted by the package.
FS The name of the mount point for a file system to be mounted
by the package.
VGCHANGE As vgchange_cmd (page 186).
NOTE: If you do not include a base module (or default or all) on the cmmakepkg command
line, cmmakepkg will ignore the modules you specify and generate a default configuration file
containing all the parameters.
For a complex package, or if you are not yet sure which parameters you will need to set, the
default may be the best choice; see the first example below.
You can use the-v option with cmmakepkg to control how much information is displayed online
or included in the configuration file. Valid values are 0, 1 and 2. -v 0 removes all comments; -v
1 includes a brief heading for each parameter; -v 2 provides a full description of each parameter.
The default is level 2.
To generate a configuration file for a failover package that uses relocatable IP addresses and
runs an application that requires file systems to be mounted at run time (enter the command
all on one line):
cmmakepkg -m sg/failover -m sg/package_ip -m sg/service -m
sg/filesystem -m sg/volume_group $SGCONF/pkg1/pkg1.conf
To generate a configuration file for a failover package that runs an application that requires
another package to be up (enter the command all on one line):
cmmakepkg -m sg/failover -m sg/dependency -m sg/service
$SGCONF/pkg1/pkg1.conf
To generate a configuration file adding the services module to an existing package (enter
the command all on one line):
cmmakepkg -i $SGCONF/pkg1/pkg1.conf -m sg/service
$SGCONF/pkg1/pkg1_v2.conf
NOTE: cmcheckconf and cmapplyconf check for missing mount points, volume groups,
etc.
NOTE: Optional parameters are commented out in the configuration file (with a # at the beginning
of the line). In some cases these parameters have default values that will take effect unless you
uncomment the parameter (remove the #) and enter a valid value different from the default. Read
the surrounding comments in the file, and the explanations in this chapter, to make sure you
understand the implications both of accepting and of changing a given default.
In all cases, be careful to uncomment each parameter you intend to use and assign it the value
you want it to have.
package_name. Enter a unique name for this package. Note that there are stricter formal
requirements for the name as of A.11.18.
package_type. Enter failover or multi_node. ( system_multi_node is reserved
for special-purpose packages supplied by HP.) Note that there are restrictions
if another package depends on this package; see About Package Dependencies (page 113).
See Types of Package: Failover, Multi-Node, System Multi-Node (page 170) for more
information.
NOTE: The package(s) this package depends on must already be part of the cluster
configuration by the time you validate this package (via cmcheckconf; see Verifying and
Applying the Package Configuration (page 196)); otherwise validation will fail.
enter the service_cmd (for example, the command that starts the process)
If the package needs to activate LVM volume groups, configure vgchange_cmd, or leave the
default.
If the package needs to mount LVM volumes to file systems (other than Red Hat GFS; see
fs_type (page 188)), use the vg parameters to specify the names of the volume groups to be
activated, and select the appropriate vgchange_cmd.
Use the fs_ parameters (page 187) to specify the characteristics of file systems and how and
where to mount them. See the comments in the FILESYSTEMS section of the configuration
file for more information and examples.
Enter each volume group on a separate line, for example:
vg vg01
vg vg02
If your package mounts large number of file systems, consider increasing the values of the
following parameters:
concurrent_fsck_operationsspecifies the number of parallel fsck operations
that will be allowed at package startup (not used for Red Hat GFS).
Red Hat GFS is not supported in Serviceguard A.11.20.00.
Specify the filesystem mount and unmount retry options. For Red Hat GFS (see fs_type
(page 188)), use the default (zero).
You can use the pev_ parameter to specify a variable to be passed to external scripts. Make
sure the variable name begins with the upper-case or lower-case letters pev and an underscore
( _). You can specify more than one variable. See About External Scripts (page 127), and
the comments in the configuration file, for more information.
If you want the package to run an external pre-script during startup and shutdown, use the
external_pre_script parameter (see (page 190)) to specify the full pathname of the script,
for example, $SGCONF/pkg1/pre_script1.
NOTE: For modular packages, you now need to distribute any external scripts identified by the
external_pre_script and external_script parameters.
But, if you are accustomed to configuring legacy packages, note that you do not have to create a
separate package control script for a modular package, or distribute it manually. (You do still have
to do this for legacy packages; see Configuring a Legacy Package (page 233).)
NOTE: This feature is supported only on modular style package and is not supported on legacy
style package.
serviceguard-xdc Environment
By default, this parameter is commented and is present in the package configuration file for the
serviceguard-xdc packages.
The email_id parameter must be used to provide e-mail addresses of the serviceguard-xdc alert
notification recipients. Each email_id parameter can have one of the following values:
A complete e-mail address
An alias
A distribution list
You can also include multiple recipients by repeating the email_id address.
The serviceguard-xdc package can send an alert e-mail:
when a mirror half of the MD device becomes inaccessible
when raid_monitor service cannot add back a mirror half of the MD device after the mirror
half becomes accessible.
For example, consider the following scenario:
If the xdcpkg package is running on node1 and the MD device configured in xdcpkg package
is /dev/md0. /dev/hpdev/my_disk1 and /dev/hpdev/my_disk2 are the mirror halves of
the MD /dev/md0, and for some reason /dev/hpdev/my_disk2 becomes inaccessible. If the
Hi,
The mirror half /dev/hpdev/my_disk2 of MD device /dev/md0, which is configured in package xdcpkg, is not
accessible from node node1. Please rectify the issue.
Thanks.
7.1.1 Reviewing Cluster and Package Status with the cmviewcl Command
Information about cluster status is stored in the status database, which is maintained on each
individual node in the cluster. You can display information contained in this database by means
of the cmviewcl command:
cmviewcl -v
You can use the cmviewcl command without root access; in clusters running Serviceguard version
A.11.16 or later, grant access by assigning the Monitor role to the users in question. In earlier
versions, allow access by adding <nodename> <nonrootuser> to the cmclnodelist file.
cmviewcl -v displays information about all the nodes and packages in a running cluster, together
with the settings of parameters that determine failover behavior.
TIP: Some commands take longer to complete in large configurations. In particular, you can
expect Serviceguards CPU usage to increase during cmviewcl -v as the number of packages
and services increases.
See the manpage for a detailed description of other cmviewcl options.
Switching Enabled for a Node: For failover packages, enabled means that the package can
switch to the specified node. disabled means that the package cannot switch to the specified
node until the node is enabled to run the package via the cmmodpkg command.
Every failover package is marked enabled or disabled for each node that is either a
primary or adoptive node for the package.
For multi-node packages, node switching disabled means the package cannot start on that
node.
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Service up 0 0 sfm_disk_monitor
Subnet up 0 0 15.13.168.0
Generic Resource up sfm_disk
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service2
Service up 0 0 sfm_disk_monitor 1
Subnet up 0 0 15.13.168.0
Generic Resource up sfm_disk1
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10 (current)
Alternate up enabled ftsys9
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Service up 0 0 sfm_disk_monitor
Subnet up 0 0 15.13.168.0
Generic Resource up sfm_disk
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Service down service2
Generic Resource up ftsys9 sfm_disk1
Subnet up 15.13.168.0
Generic Resource up ftsys10 sfm_disk1
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9
pkg2 now has the status down, and it is shown as unowned, with package switching disabled.
Note that switching is enabled for both nodes, however. This means that once global switching is
re-enabled for the package, it will attempt to start up on the primary node.
NOTE: If you halt pkg2 with the cmhaltpkg command, and the package contains non-native
Serviceguard modules that failed during the normal halt process, then the package is moved to
the partially_down status and halt_aborted state. The command exits at this point. For
more information, see Handling Failures During Package Halt (page 218).
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Service up 0 0 sfm_disk_monitor
Subnet up 0 0 15.13.168.0
Generic Resource up sfm_disk
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service2
Service up 0 0 sfm_disk_monitor
Subnet up 0 0 15.13.168.0
Generic Resource up sfm_disk
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9 (current)
Network_Parameters:
INTERFACE STATUS NAME
PRIMARY up eth0
PRIMARY up eth1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover min_package_node
Failback automatic
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Subnet up manx 192.8.15.0
Generic Resource unknown manx sfm_disk
Subnet up burmese 192.8.15.0
Generic Resource unknown burmese sfm_disk
Subnet up tabby 192.8.15.0
Generic Resource unknown tabby sfm_disk
Subnet up persian 192.8.15.0
Generic Resource unknown persian sfm_disk
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled manx
Alternate up enabled burmese
Alternate up enabled tabby
Alternate up enabled persian
Volume groups (package) cmcheckconf (1m), cmapplyconf Verifies for the following:
(1m) existence
See also Verifying the Cluster
availability across all the nodes
Configuration (page 163).
where the package is
configured to run.
same physical volumes across
all the nodes where the
package is configured to run.
same volume group across all
the nodes where the package
is configured to run.
NOTE: The volume group
verifications are ignored in
serviceguard-xdc and Metrocluster
environment.
Volume group activation protection cmcheckconf (1m), cmapplyconf Verifies whether the volume group
(cluster) (1m) activation protection is enabled in
See also Verifying the Package the lvm.conf file. For more
Configuration (page 238). information, see Enabling Volume
Group Activation Protection
(page 148)
LVM physical volumes (package) cmcheckconf (1m), cmapplyconf Verifies for the consistency of the
(1m) volume groups and physical
volumes of the volume group
across all the nodes where the
package is configured to run.
Quorum Server (cluster) cmcheckconf (1m), cmapplyconf These commands verify that the
(1m). quorum server, if used, is running
and all nodes are authorized to
access it; and, if more than one IP
address is specified, that the
quorum server is reachable from
all nodes through both the IP
addresses.
Lock LUN (cluster) cmcheckconf (1m), cmapplyconf These commands verify if all the
(1m) cluster nodes are configured to use
the same device as lock LUN and
that the lock LUN device file is a
block device file.
File consistency (cluster) cmcheckconf (1m), cmcompare (1m). To verify file consistency across all
the nodes in the cluster:
IMPORTANT: See the manpage for
differences in return codes from 1. Customize the $SGCONF/
cmcheckconf without options versus cmclfiles2check file.
cmcheckconf -C 2. Distribute it to all the nodes
using the cmsync (1m)
command.
3. Run the cmcheckconf, or
cmcheckconf -C, or
cmchekconf -v {1|2}
command.
For a subset of nodes, or to verify
only specific characteristics such
as ownership, content, and so on,
use the cmcompare (1m)
command.
Mount points (package) cmcheckconf (1m), cmapplyconf These commands verify that the
(1m) mount-point directories specified
See also Verifying the Package in the package configuration file
Configuration (page 238). exist on all nodes that can run the
package.
Service commands (package) cmcheckconf (1m), cmapplyconf These commands verify that files
(1m) specified by service commands
See also Verifying the Package exist and are executable. Service
Configuration (page 238). commands whose paths are nested
within an unmounted shared file
system are not checked.
File systems (package) cmcheckconf (1m), cmapplyconf For LVM only, commands verify
(1m) that file systems are on the logical
See also Verifying the Package volumes identified by the fs_name
Configuration (page 238). parameter (page 187).
External scripts and pre-scripts cmcheckconf (1m), cmapplyconf A non-zero return value from any
(modular package) (1m) script results the commands to fail.
NFS server connectivity (package) cmcheckconf (1m), cmapplyconf If the package configuration file
(1m) contains NFS file system, it
validates the following:
Connectivity to the NFS server
from all the package nodes.
Export of share by the NFS
server.
The status of the NFS daemons
on the NFS server.
NOTE: For the NFS file
system mount to be successful,
the NFS daemon must be
running on the NFS server.
NOTE: The job must run on one of the nodes in the cluster. The crontab e command is used
to edit the crontab file. This must be run as the root user, because only the root user can run
cluster verification. The cron (1m) command sets the jobs user and group IDs to those of the
user who submitted the job.
For example, the following script runs cluster verification and sends an email to
[email protected] when verification fails.
#!/bin/sh
cmcheckconf -v >/tmp/cmcheckconf.output
if (( $? != 0 ))
then
mailx -s "Cluster verification failed" [email protected] 2>&1 </tmp/cmcheckconf.output
fi
To run this script from cron, use the crontab -e command and create an entry in he crontabs
file. For example, the following entry runs the script at 8 a.m. on 20th of every month:
0 8,20 * * * verification.sh
For more information, see the crontab (1) manpage.
7.1.12.3 Limitations
Serviceguard does not check for the following conditions:
Proper configuration of Access Control Policies. For more information about Access Control
Policies, see Controlling Access to the Cluster (page 158).
File systems configured to mount automatically on boot (that is, Serviceguard does not check
/etc/fstab)
Uniqueness of volume group major and minor numbers.
Proper functioning of redundant storage paths.
Consistency of Kernel parameters and driver configurations across nodes.
Mount point overlaps (such that one file system is obscured when another is mounted).
NOTE: Manually starting or halting the cluster or individual nodes does not require access to the
quorum server, if one is configured. The quorum server is only used when tie-breaking is needed
following a cluster partition.
CAUTION: HP Serviceguard cannot guarantee data integrity if you try to start a cluster with the
cmruncl -n command while a subset of the cluster's nodes are already running a cluster. If the
network connection is down between nodes, using cmruncl -n might result in a second cluster
forming, and this second cluster might start up the same applications that are already running on
the other cluster. The result could be two applications overwriting each other's data on the disks.
NOTE: HP recommends that you remove a node from participation in the cluster (by running
cmhaltnode as shown below, or Halt Node in Serviceguard Manger) before running the Linux
shutdown command, especially in cases in which a packaged application might have trouble
during shutdown and not halt cleanly.
7.2.3.1 Using Serviceguard Commands to Remove a Node from Participation in a Running Cluster
Use the cmhaltnode command to halt one or more nodes in a cluster. The cluster daemon on
the specified node stops, and the node is removed from active participation in the cluster.
To halt a node with a running package, use the -f option. If a package was running that can be
switched to an adoptive node, the switch takes place and the package starts on the adoptive node.
For example, the following command causes the Serviceguard daemon running on node ftsys9
in the sample configuration to halt and the package running on ftsys9 to move to ftsys10:
cmhaltnode -f -v ftsys9
NOTE: Keep in mind that the purpose of the LAD capabilities is to allow you do maintenance
on one or more nodes, or the entire cluster. If you want to do maintenance on individual packages,
or on elements of the cluster configuration that affect only one package, or a few packages, you
should probably use package maintenance mode; see Maintaining a Package: Maintenance
Mode (page 220).
7.3 Halting a Node or the Cluster while Keeping Packages Running 213
Restart normal package monitoring by restarting the node (cmrunnode) or the cluster ().
You can forcefully halt a detached node (cmhaltnode (1m)) with the -f option.
In preview mode (-t) cmrunnode and cmruncl can provide only a partial assessment of
the effect of re-attaching packages.
The assessment may not accurately predict the placement of packages that depend on the
packages that will be re-attached. For more information about preview mode, see Previewing
the Effect of Cluster Changes (page 226).
cmmodpkg -e -t is not supported for a detached package.
You cannot run a package that has been detached.
This could come up if you detect that a package has failed while detached (and hence not
being monitored by Serviceguard). Before you could restart the package on another node,
IMPORTANT: This means that you will need to detect any errors that occur while the package
is detached, and take corrective action by running cmhaltpkg to halt the detached package
and cmrunpkg (1m) to restart the package on another node.
When you restart a node or cluster whose packages have been detached, the packages are
re-attached; that is, Serviceguard begins monitoring them again.
At this point, Serviceguard checks the health of the packages that were detached and takes
any necessary corrective action for example, if a failover package has in fact failed while
it was detached, Serviceguard will halt it and restart it on another eligible node.
CAUTION: Serviceguard does not check LVM volume groups, mount points, and relocatable
IP addresses when re-attaching packages.
cmviewcl (1m) reports the status and state of detached packages as detached.
This is true even if a problem has occurred since the package was detached and some or all
of the package components are not healthy or not running.
Because Serviceguard assumes that a detached package has remained healthy, the package
is considered to be UP for dependency purposes.
This means, for example, that if you halt node1, detaching pkgA, and pkgB depends on
pkgA to be UP on ANY_NODE, pkgB on node2 will continue to run (or can start) while pkgA
is detached. See About Package Dependencies (page 113) for more information about
dependencies.
As always, packages cannot start on a halted node or in a halted cluster.
7.3 Halting a Node or the Cluster while Keeping Packages Running 215
When a node having detached packages is back up after a reboot they can:
Rejoin the cluster and the detached packages can move to "running" or "failed" state. If
the detached packages are moved to running state, then they must be halted and rerun
as they may have several inconsistencies post reboot.
Not rejoin the cluster and the detached packages remain detached. Such packages must
be halted and rerun to avoid any inconsistencies that can be caused due to the reboot.
If you halt a package and disable it before running cmhaltcl -d to detach other packages
running in the cluster, auto_run will be automatically re-enabled for this package when the
cluster is started again, forcing the package to start.
To prevent this behavior and keep the package halted and disabled after the cluster restarts,
change auto_run to no in the package configuration file (page 176), and re-apply the
package, before running cmhaltcl -d.
NOTE: If you do not do this, the cmhaltnode in the next step will fail.
NOTE: -d and -f are mutually exclusive. See cmhaltnode (1m) for more information.
NOTE: -d and -f are mutually exclusive. See cmhaltcl (1m) for more information.
7.3.7 Example: Halting the Cluster for Maintenance on the Heartbeat Subnets
Suppose that you need to do networking maintenance that will disrupt all the cluster's heartbeat
subnets, but it is essential that the packages continue to run while you do it. In this example we'll
assume that packages pkg1 through pkg5 are unsupported for Live Application Detach, and pkg6
through pkgn are supported.
Proceed as follows:
1. Halt all the unsupported packages:
cmhaltpkg pkg1 pkg2 pkg3 pkg4 pkg5
2. Halt the cluster, detaching the remaining packages:
cmhaltcl -d
3. Upgrade the heartbeat networks as needed.
4. Restart the cluster, automatically re-attaching pkg6 through pkgn and starting any other
packages that have auto_run (page 176) set to yes in their package configuration file:
cmruncl
5. Start the remaining packages; for example:
cmmodpkg -e pkg1 pkg2 pkg3 pkg4 pkg5
NOTE: This error handling mechanism is applicable only for failover packages and not for
multi-node or system multi-node packages.
It is applicable only for modular packages and not for legacy packages.
If a package is in the detached or maintenance mode, the package cannot be in halt_aborted
state.
NOTE: If you need to do maintenance that requires halting a node, or the entire cluster, you
should consider Live Application Detach; see Halting a Node or the Cluster while Keeping Packages
Running (page 213).
NOTE: In order to run a package in partial-startup maintenance mode, you must first put it in
maintenance mode. This means that packages in partial-startup maintenance mode share the
characteristics described below for packages in maintenance mode, and the same rules and
dependency rules apply. Additional rules apply to partial-startup maintenance mode, and the
procedure involves more steps, as explained underPerforming Maintenance Using Partial-Startup
Maintenance Mode.
NOTE: But a failure in the package control script will cause the package to fail. The package
will also fail if an external script (or pre-script) cannot be executed or does not exist.
IMPORTANT: See the latest Serviceguard release notes for important information about version
requirements for package maintenance.
The package must have package switching disabled before you can put it in maintenance
mode.
You can put a package in maintenance mode only on one node.
The node must be active in the cluster and must be eligible to run the package (on the
package's node_name list).
If the package is not running, you must specify the node name when you run cmmodpkg
(1m) to put the package in maintenance mode.
If the package is running, you can put it into maintenance only on the node on which it
is running.
While the package is in maintenance mode on a node, you can run the package only
on that node.
You cannot put a package in maintenance mode, or take it out maintenance mode, if doing
so will cause another running package to halt.
Since package failures are ignored while in maintenance mode, you can take a running
package out of maintenance mode only if the package is healthy.
Serviceguard checks the state of the packages services and subnets to determine if the package
is healthy. If it is not, you must halt the package before taking it out of maintenance mode.
Generic resources configured in a package must be available (status 'up') before taking the
package out of maintenance mode.
You cannot do online configuration as described under Reconfiguring a Package (page 240).
You cannot configure new dependencies involving this package; that is, you cannot make it
dependent on another package, or make another package depend on it. See also Dependency
Rules for a Package in Maintenance Mode or Partial-Startup Maintenance Mode (page 223).
You cannot use the -t option of any command that operates on a package that is in
maintenance mode; see Previewing the Effect of Cluster Changes (page 226) for information
about the -t option.
You cannot run a package that depends on pkgA, unless the dependent package itself
is in maintenance mode.
Dependency rules governing packages that pkgA depends on to be UP are bypassed so that
these packages can halt and fail over as necessary while pkgA is in maintenance mode.
If both packages in a dependency relationship are in maintenance mode, dependency rules
are ignored for those two packages.
For example, both packages in an exclusionary dependency can be run and halted in
maintenance mode at the same time.
NOTE: If you have a package configured with generic resources and you attempt to take it out
of the maintenance mode back to the running state, the status of generic resources are evaluated.
If any of the generic resources is 'down', the package cannot be taken out of the maintenance
mode.
7.5.2.1 Procedure
Follow these steps to perform maintenance on a package's networking components.
In this example, we'll call the package pkg1 and assume it is running on node1.
1. Place the package in maintenance mode:
cmmodpkg -m on -n node1 pkg1
2. Perform maintenance on the networks or resources and test manually that they are working
correctly.
NOTE: If you now run cmviewcl, you'll see that the STATUS of pkg1 is up and its STATE
is maintenance.
7.5.3.1 Procedure
Follow this procedure to perform maintenance on a package. In this example, we'll assume a
package pkg1 is running on node1, and that we want to do maintenance on the package's
services.
1. Halt the package:
cmhaltpkg pkg1
2. Place the package in maintenance mode:
cmmodpkg -m on -n node1 pkg1
NOTE: If you now run cmviewcl, you'll see that the STATUS of pkg1 is up and its STATE
is maintenance.
NOTE: You can also use cmhaltpkg -s, which stops the modules started by cmrunpkg
-m in this case, all the modules up to and including package_ip.
Change Quorum Server Configuration Cluster can be running; seeWhat Happens when You Change
the Quorum Configuration Online (page 43).
Change Cluster Lock Configuration (lock LUN) Cluster can be running. See Updating the Cluster Lock LUN
Configuration Online (page 233) andWhat Happens when
You Change the Quorum Configuration Online (page 43).
Add NICs and their IP addresses to the cluster Cluster can be running. See Changing the Cluster Networking
configuration Configuration while the Cluster Is Running (page 230).
Delete NICs and their IP addresses, from the cluster Cluster can be running. SeeChanging the Cluster Networking
configuration Configuration while the Cluster Is Running (page 230).
Change the designation of an existing interface from Cluster can be running. See Changing the Cluster Networking
HEARTBEAT_IP to STATIONARY_IP, or vice versa Configuration while the Cluster Is Running (page 230).
Change an interface from IPv4 to IPv6, or vice versa Cluster can be running. See Changing the Cluster Networking
Configuration while the Cluster Is Running (page 230)
Reconfigure IP addresses for a NIC used by the cluster Must delete the interface from the cluster configuration,
reconfigure it, then add it back into the cluster configuration.
See What You Must Keep in Mind (page 230). Cluster can be
running throughout.
Change IP Monitor parameters: SUBNET, Cluster can be running. See the entries for these parameters
IP_MONITOR, POLLING TARGET under Cluster Configuration Parameters (page 90)for more
information.
NOTE: You cannot use the -t option with any command operating on a package in maintenance
mode; see Maintaining a Package: Maintenance Mode (page 220).
For more information about these commands, see their respective manpages. You can also perform
these preview functions in Serviceguard Manager: Select the Preview [] check box for the
action on the respective pages.
When you use the -t option, the command, rather than executing as usual, predicts the results
that would occur, sending a summary to $stdout. For example, assume that pkg1 is a high-priority
package whose primary node is node1, and which depends on pkg2 and pkg3 to run on the
same node. These are lower-priority packages which are currently running on node2. pkg1 is
down and disabled, and you want to see the effect of enabling it:
NOTE: The preview cannot predict run and halt script failures.
For more information about package dependencies and priorities, see About Package
Dependencies (page 113).
IMPORTANT: For detailed information and examples, see the cmeval (1m) manpage.
NOTE: Before you start, make sure you have configured access to ftsys10 as described under
Configuring Root-Level Access (page 136).
7.6.3.2 Removing Nodes from the Cluster while the Cluster Is Running
You can use Serviceguard Manager to delete nodes, or Serviceguard commands as shown below.
The following restrictions apply:
The node must be halted. See Removing Nodes from Participation in a Running Cluster
(page 212).
If the node you want to delete is unreachable (disconnected from the LAN, for example), you
can delete the node only if there are no packages which specify the unreachable node. If
there are packages that depend on the unreachable node, halt the cluster; see Halting the
Entire Cluster (page 213).
Use the following procedure to delete a node with Serviceguard commands. In this example, nodes
ftsys8, ftsys9 and ftsys10 are already configured in a running cluster named cluster1,
and you are deleting node ftsys10.
NOTE: If you want to remove a node from the cluster, run the cmapplyconf command from
another node in the same cluster. If you try to issue the command on the node you want removed,
you will get an error message.
1. Use the following command to store a current copy of the existing cluster configuration in a
temporary file:
cmgetconf -c cluster1 temp.conf
2. Specify the new set of nodes to be configured (omitting ftsys10) and generate a template
of the new configuration:
cmquerycl -C clconfig.conf -c cluster1 -n ftsys8 -n ftsys9
3. Edit the file clconfig.conf to check the information about the nodes that remain in the
cluster.
4. Halt the node you are going to remove (ftsys10in this example):
cmhaltnode -f -v ftsys10
5. Verify the new configuration:
cmcheckconf -C clconfig.conf
NOTE: If you are trying to remove an unreachable node on which many packages are configured
to run, you may see the following message:
The configuration change is too large to process while the cluster is running.
Split the configuration change into multiple requests or halt the cluster.
In this situation, you must halt the cluster to remove the node.
7.6.4 Changing the Cluster Networking Configuration while the Cluster Is Running
7.6.4.1 What You Can Do
Online operations you can perform include:
Add a network interface and its HEARTBEAT_IP or STATIONARY_IP.
Delete a network interface and its HEARTBEAT_IP or STATIONARY_IP.
Change a HEARTBEAT_IP or STATIONARY_IP interface from IPv4 to IPv6, or vice versa.
Change the designation of an existing interface from HEARTBEAT_IP to STATIONARY_IP,
or vice versa.
Change the NETWORK_POLLING_INTERVAL.
Change IP Monitor parameters: SUBNET, IP_MONITOR, POLLING TARGET; see the entries
for these parameters underCluster Configuration Parameters (page 90) for more information.
A combination of any of these in one transaction (cmapplyconf), given the restrictions below.
You cannot change the IP configuration of an interface (NIC) used by the cluster in a single
transaction (cmapplyconf).
You must first delete the NIC from the cluster configuration, then reconfigure the NIC (using
ifconfig, for example), then add the NIC back into the cluster.
Examples of when you must do this include:
CAUTION: Do not add IP addresses to network interfaces that are configured into the Serviceguard
cluster, unless those IP addresses themselves will be immediately configured into the cluster as
stationary IP addresses. If you configure any address other than a stationary IP address on a
Serviceguard network interface, it could collide with a relocatable package address assigned by
Serviceguard.
Some sample procedures follow.
IMPORTANT: See What Happens when You Change the Quorum Configuration Online
(page 43) for important information.
1. In the cluster configuration file, modify the value of CLUSTER_LOCK_LUN for each node.
2. Run cmcheckconf to check the configuration.
3. Run cmapplyconf to apply the configuration.
If you need to replace the physical device, see Replacing a Lock LUN (page 257).
NOTE: For modular packages, the default form for parameter names and literal values in
the package configuration file is lower case; for legacy packages the default is upper case.
There are no compatibility issues; Serviceguard is case-insensitive as far as the parameter
names are concerned.
Because this section is intended to be used primarily when you reconfiguring an existing
legacy package, we are using the legacy parameter names (in upper case) for sake of
continuity. But if you generate the configuration file using cmmakepkg or cmgetconf, you
will see the parameter names as they appear in modular packages; see the notes below and
the Package Parameter Explanations (page 174) for details of the name changes.
IMPORTANT: Each subnet specified here must already be specified in the cluster configuration
file via the NETWORK_INTERFACE parameter and either the HEARTBEAT_IP or
STATIONARY_IP parameter. See Cluster Configuration Parameters (page 90) for more
information.
See also Stationary and Relocatable IP Addresses and Monitored Subnets (page 62) and
monitored_subnet (page 181).
IMPORTANT: For cross-subnet configurations, see Configuring Cross-Subnet Failover
(page 239).
If your package runs services, enter the SERVICE_NAME as described under service_name
(page 183) and values for SERVICE_FAIL_FAST_ENABLED as described under
service_fail_fast_enabled (page 184) and SERVICE_HALT_TIMEOUT as described
under service_halt_timeout (page 184). Enter a group of these three for each service.
IMPORTANT: Note that the rules for valid SERVICE_NAMEs are more restrictive as of
Serviceguard A.11.18.
CAUTION: If you are not using the serviceguard-xdc or CLX products, do not modify the
REMOTE DATA REPLICATION DEFINITION section. If you are using one of these products,
consult the products documentation.
If you are using LVM, enter the names of volume groups to be activated using the VG[] array
parameters, and select the appropriate options for the storage activation command, including
options for mounting and unmounting file systems, if necessary. See the fs_ parameter
descriptions starting with fs_mount_retry_count (page 187) for more information).
NOTE: Red Hat GFS and reiserfs are not supported in Serviceguard A.11.20.00.
Add the names of logical volumes and the file system that will be mounted on them.
Specify the filesystem mount and unmount retry options.
If your package uses a large number of volume groups or disk groups or mounts a large
number of file systems, consider increasing the number of concurrent vgchange,
mount/umount, and fsck operations;
Define IP subnet and IP address pairs for your package. IPv4 or IPv6 addresses are allowed.
Add service name(s).
Add service command(s)
Add a service restart parameter, if you so decide.
For more information about services, see the discussion of the service_ parameters starting
with service_name (page 183).
function customer_defined_run_cmds
{
# ADD customer defined run commands.
: # do nothing instruction, because a function must contain some command.
date >> /tmp/pkg1.datelog
echo 'Starting pkg1' >> /tmp/pkg1.datelog
test_return 51
}
function customer_defined_halt_cmds
{
# ADD customer defined halt commands.
: # do nothing instruction, because a function must contain some command.
date >> /tmp/pkg1.datelog
echo 'Halting pkg1' >> /tmp/pkg1.datelog
test_return 52
}
7.7.4.1 Distributing the Configuration And Control Script with Serviceguard Manager
When you have finished creating a package in Serviceguard Manager, click Apply
Configuration. If the package configuration has no errors, it is converted to a binary file and
distributed to the cluster nodes.
IMPORTANT: In a cross-subnet configuration, you cannot use the same package control script
on all nodes if the package uses relocatable IP addresses. See Configuring Cross-Subnet Failover
(page 239).
Use Linux commands to copy package control scripts from the node where you created the files,
to the same pathname on all nodes which can possibly run the package. Use your favorite method
of file transfer (For example, scp or ftp). For example, from ftsys9, you can issue the scp
command to copy the package control script to ftsys10:
scp $SGCONF/pkg1/control.sh ftsys10:$SGCONF/pkg1/control.sh
Generate the binary configuration file and distribute it across the nodes.
cmapplyconf -v -C $SGCONF/cmcl.conf -P $SGCONF/pkg1/pkg1.conf
The cmapplyconf command creates a binary version of the cluster configuration file and distributes
it to all nodes in the cluster. This action ensures that the contents of the file are consistent across
all nodes.
NOTE: You must use cmcheckconf and cmapplyconf again any time you make changes to
the cluster and package configuration files.
NOTE: If you are using a Metrocluster, you can configure site aware cluster using the SITE and
SITE_NAME parameters. For more information about SITE and SITE_NAME parameters, see
Cluster Configuration Parameters (page 90).
Assuming nodeA is pkg1s primary node (where it normally starts), create node_name entries in
the package configuration file as follows:
node_name nodeA
node_name nodeB
node_name nodeC
node_name nodeD
IMPORTANT: In a cross-subnet configuration, you cannot share a single package control script
among nodes on different subnets if you are using relocatable IP addresses. In this case you will
need to create a separate control script to be used by the nodes on each subnet.
In our example, you would create two copies of pkg1s package control script, add entries to
customize it for subnet 15.244.65.0 or 15.244.56.0, and copy one of the resulting scripts to
each node, as follows.
NOTE: The cmmigratepkg command requires Perl version 5.8.3 or higher on the system on
which you run the command.
IMPORTANT: Restrictions on package names, dependency names, and service names have
become more stringent as of A.11.18. Packages that have or contain names that do not
conform to the new rules (spelled out under package_name (page 175)) will continue to
run, but if you reconfigure these packages, you will need to change the names that do not
conform; cmcheckconf and cmapplyconf will enforce the new rules.
CAUTION: If cmcheckconf fails, do not proceed to the next step until you have corrected
all the errors.
In general, you have greater scope for online changes to a modular than to a legacy package. In
some cases, though, the capability of legacy packages has been upgraded to match that of modular
packages as far as possible; these cases are shown in the table. For more information about legacy
and modular packages, see Chapter 6 (page 169).
NOTE: If neither legacy nor modular is called out under Change to the Package, the Required
Package State applies to both types of package. Changes that are allowed, but which HP does
not recommend, are labeled should not be running.
IMPORTANT: Actions not listed in the table can be performed for both types of package while
the package is running.
In all cases the cluster can be running, and packages other than the one being reconfigured can
be running. You can make changes to package configuration files at any time; but do not apply
them (using cmapplyconf or Serviceguard Manager) to a running package in the cases indicated
in the table.
NOTE: All the nodes in the cluster must be powered up and accessible when you make package
configuration changes.
Table 14 Types of Changes to Packages
Change to the Package Required Package State
Change run script contents: legacy Package can be running, but should not be starting.
package Timing problems may occur if the script is changed while the package is starting.
Change halt script contents: legacy Package can be running, but should not be halting.
package Timing problems may occur if the script is changed while the package is halting.
Add or remove a SUBNET (in Package must not be running. (Also applies to cross-subnet configurations.)
control script) : legacy package Package must not be running. Subnet must already be configured into the cluster.
Add or remove an IP (in control Package must not be running. (Also applies to cross-subnet configurations.)
script) : legacy package
Change a file system: modular Package should not be running (unless you are only changing fs_umount_opt).
package Changing file-system options other than fs_umount_opt may cause problems
because the file system must be unmounted (using the existing fs_umount_opt)
and remounted with the new options; the CAUTION under Remove a file system:
modular package applies in this case as well.
If only fs_umount_opt is being changed, the file system will not be unmounted;
the new option will take effect when the package is halted or the file system is
unmounted for some other reason.
Add a generic resource of Package can be running provided the status of the generic resource is not 'down'.
evaluation type For information on online changes to generic resources, see Online
during_package_start Reconfiguration of Generic Resources (page 112).
Add a generic resource of Package can be running if the status of generic resource is 'up', else package
evaluation type must be halted.
before_package_start
Change the Package can be running if the status of generic resource is 'up'.
generic_resource_evaluation_type Not allowed if changing the generic_resource_evaluation_type causes the package
to fail.
For information on online changes to generic resources, see Online
Reconfiguration of Generic Resources (page 112).
Change the Package can be running for resources of evaluation type before_package_start
generic_resource_up_criteria or during_package_start provided the new up criteria does not cause the resource
status to evaluate to 'down'.
Not allowed if changing the generic_resource_up_criteria causes the package to
fail.
For information on online changes to generic resources, see Online
Reconfiguration of Generic Resources (page 112).
Change modular serviceguard-xdc Package can be running. See Online Reconfiguration of serviceguard-xdc
package parameters: Modular Package Parameters (page 112).
xdc/xdc/rpo_target
xdc/xdc/raid_monitor_interval
xdc/xdc/raid_device
xdc/xdc/device_0
xdc/xdc/device_1
NOTE: Consider a configuration in which the volume group and the corresponding filesystem
are present in two different packages. To perform online reconfiguration of such packages, the
package with the volume group must be reconfigured before you reconfigure the filesystem package.
HP recommends that you do not perform online reconfiguration for both these packages in a single
command as it might cause one or more packages to fail.
NOTE: You will not be able to cancel if you use cmapplyconf -f.
Package nodes
Package dependencies
Package weights (and also node capacity, defined in the cluster configuration file)
Package priority
auto_run
failback_policy
Recommendations
HP recommends that you do modifications on one module at a time.
You must consider only one package for online reconfiguration.
If you are adding a new module or a parameter when the package is UP, make the changes
in the Serviceguard package and later configure the application to use the changes.
For example, to add a mount point:
a. Edit the package configuration file and add the mount point.
b. Verify the package configuration file:
#cmcheckconf -P <pkg_name>
************************
syslog during this time:
************************
Nov 28 23:41:22 test1 cmserviced[18979]: Package Script for pkg1 failed with an exit(18).
Nov 28 23:41:22 test1 cmcld[18900]: Reconfigured package pkg1 on node test1.
Nov 28 23:41:22 test1 cmcld[18900]: Online reconfiguration of package pkg1 on node test1 failed. Check the
package log file for complete information.
Nov 28 23:41:22 test1 cmcld[18900]: Request from node test1 to disable global switching for package pkg1.
Adding an sg/external_pre_script If an external pre script which is To start the external pre script:
external pre (external.sh) added to the package #extern_pre_script.sh
script to the configuration failed to start, run start
package the script manually with start
option.
script_name start
Adding a sg/service If a service which is added to the To run the process as service:
service to the (service.sh) package configuration failed to #cmrunserv db1
package start or not attempted, use the /var/opt/db/database1
cmrunserv command to start
the service
For more information, see
cmrunserv (1m) manpage.
Table 16 describes how to fix the errors in the affected modules that are encountered during online
deletion of a package.
Removing IP sg/package_ip If an IP address which is deleted To remove the IP from the package
from the (package_ip.sh) to the package configuration 10.149.2.5:
package failed to remove or not #cmmodnet -r -I 10.149.2.5
attempted, use the cmmodnet 10.149.2.0
command to remove the IP
address.
For more information, see
cmmodnet (1m) manpage.
Removing sg/filesystem If storage deleted from the To unmount the mount point mnt1:
storage from (filesystem.sh) package has failed or not #umount /mnt1
the package sg/volume_group attempted, ensure the following:
(volume_group.sh) To delete the hostags from the
sg/pr_cntl The mount point is vg_dd0 on node test1.ind.hp.com:
(pr_cntl.sh) unmounted.
#vgchange --deltag
Delete the hosttagss from the test1.ind.hp.com vg_dd1
disk.
To remove the persistent reservation
Volume group is de-activated from the disk /dev/sde:
with hosttags.
#pr_cleanup -lun /dev/sde
Persistent reservation is
removed from the disk.
For more information, see
sg_persist(1m),
vgchange(1m),
pr_cleanup(1m),
multipath(1m), and
mount(1m) manpage.
Removing MD ext/xdc If an MD removed from the To stop and start the raid monitor
from the (xdc.sh) package configuration has not service:
package (for been stopped, use the mdadm #cmhaltserv
XDC command to stop the MD. For <raid_monitor_service>
packages) more information, see mdadm
(1m) manpage. #cmrunserv
<raid_monitor_service_name>
Restart the raid monitor service <raid_monitor_service_cmd>
manually after stopping the MD.
To stop the MD /dev/md1:
#mdadm -S /dev/md1
CAUTION: Remove the node from the cluster first. If you run the rpm -e command on a server
that is still a member of a cluster, it will cause that cluster to halt, and the cluster to be deleted.
To remove Serviceguard:
1. If the node is an active member of a cluster, halt the node first.
2. If the node is included in a cluster configuration, remove the node from the configuration.
3. If you are removing Serviceguard from more than one node, run rpm -eon one node at a
time.
CAUTION: In testing the cluster in the following procedures, be aware that you are causing
various components of the cluster to fail, so that you can determine that the cluster responds correctly
to failure situations. As a result, the availability of nodes and applications may be disrupted.
NOTE: If there was a monitoring script configured for this generic resource, then the monitoring
script would also be attempting to set the status of the generic resource.
CAUTION: Before you start, make sure that all nodes have logged a message such as the following
in syslog:
WARNING: Cluster lock LUN /dev/sda1 is corrupt: bad label. Until this
situation is corrected, a single failure could cause all nodes in the
cluster to crash.
Once all nodes have logged this message, use a command such as the following to specify the
new cluster lock LUN:
cmdisklock reset /dev/sda1
CAUTION: You are responsible for determining that the device is not being used by LVM or any
other subsystem on any node connected to the device before using cmdisklock. If you use
cmdisklock without taking this precaution, you could lose data.
NOTE: cmdisklock is needed only when you are repairing or replacing a lock LUN; see the
cmdisklock (1m) manpage for more information.
Serviceguard checks the lock LUN every 75 seconds. After using the cmdisklock command,
review the syslog file of an active cluster node for not more than 75 seconds. By this time you
should see a message showing that the lock disk is healthy again.
8.4.1 Examples
The following command will clear all the PR reservations registered with the key abc12 on the set
of LUNs listed in the file /tmp/pr_device_list
pr_cleanup -k abc12 lun -f /tmp/pr_device_list
pr_device_list contains entries such as the following:
/dev/sdb1
/dev/sdb2
Alternatively you could enter the device-file names on the command line:
pr_cleanup -k abc12 lun /dev/sdb1 /dev/sdb2
The next command clears all the PR reservations registered with the PR key abcde on the underlying
LUNs of the volume group vg01:
pr_cleanup -k abcde vg01
NOTE: Because the keyword lun is not included, the device is assumed to be a volume group.
2. Use the cmapplyconf command to apply the configuration and copy the new binary file to
all cluster nodes:
cmapplyconf -C config.conf
This procedure updates the binary file with the new MAC address and thus avoids data inconsistency
between the outputs of the cmviewconf and ifconfig commands.
IMPORTANT: Make sure you read the latest version of the HP Serviceguard Quorum Server
Release Notes before you proceed. You can find them at http://www.hp.com/go/
hpux-serviceguard-docs (Select HP Serviceguard Quorum Server Software). You should also consult
the Quorum Server white papers at the same location.
1. Remove the old quorum server system from the network.
2. Set up the new system and configure it with the old quorum servers IP address and hostname.
3. Install and configure the quorum server software on the new system. Be sure to include in the
new QS authorization file (for example, /usr/local/qs/conf/qs_authfile) on all of
the nodes that were configured for the old quorum server. Refer to the qs(1) man page for
details about configuring the QS authorization file.
NOTE: The quorum server reads the authorization file at startup. Whenever you modify the
file qs_authfile, run the following command to force a re-read of the file. For example,
on a Red Hat distribution:
/usr/local/qs/bin/qs -update
On a SUSE distribution:
/opt/qs/bin/qs -update
CAUTION: Make sure that the old system does not rejoin the network with the old IP address.
NOTE: While the old quorum server is down and the new one is being set up:
The cmquerycl, cmcheckconf and cmapplyconf commands will not work
The cmruncl, cmhaltcl, cmrunnode, and cmhaltnode commands will work
If there is a node or network failure that creates a 50-50 membership split, the quorum server
will not be available as a tie-breaker, and the cluster will fail.
NOTE: Many other products running on Linux in addition to Serviceguard use the syslog file to
save messages. Refer to your Linux documentation for additional information on using the system
log.
The default Serviceguard control scripts are designed to take the straightforward steps needed to
get an application running or stopped. If the package administrator specifies a time limit within
which these steps need to occur and that limit is subsequently exceeded for any reason, Serviceguard
takes the conservative approach that the control script logic must either be hung or defective in
some way. At that point the control script cannot be trusted to perform cleanup actions correctly,
thus the script is terminated and the package administrator is given the opportunity to assess what
cleanup steps must be taken.
If you want the package to switch automatically in the event of a control script timeout, set the
node_fail_fast_enabled parameter (page 176) to YES. In this case, Serviceguard will cause
a reboot on the node where the control script timed out. This effectively cleans up any side effects
of the packages run or halt attempt. In this case the package will be automatically restarted on
any available alternate node for which it is configured.
8.8.8.3 Messages
The coordinator node in Serviceguard sometimes sends a request to the quorum server to set the
lock state. (This is different from a request to obtain the lock in tie-breaking.) If the quorum servers
connection to one of the cluster nodes has not completed, the request to set may fail with a two-line
message like the following in the quorum servers log file:
Oct 008 16:10:05:0: There is no connection to the applicant
2 for lock /sg/lockTest1
Oct 08 16:10:05:0:Request for lock /sg/lockTest1 from
applicant 1 failed: not connected to all applicants.
This condition can be ignored. The request will be retried a few seconds later and will succeed.
The following message is logged:
Oct 008 16:10:06:0: Request for lock /sg/lockTest1
succeeded. New lock owners: 1,2.
Problem Solution
Service Temporarily Unavailable when trying to launch Ensure that a loop back address is mentioned in the /etc/
Serviceguard Manager hosts file 127.0.0.1 localhost.localdomain
localhost
Tomcat process has not started by any chance Run the Tomcat startup command
/opt/hp/hpsmh/tomcat/bin/startup.sh
The following message is displayed when erviceguard 1. Install the pre-requisite version of Java(1.6 or later) and
Manager is launched from HP System Management Home Tomcat(5.x or 6.x)
(SMH) web page, if Java and/or Tomcat versions are not 2. Ensure /usr/bin/java is pointing to supported
as per the pre-requisite: version of Java ("ll /usr/bin/java" ). Otherwise, unlink
Service Temporarily Unavailable or Http unlink /usr/bin/java and create a new link ln
status 500 proxy error or SMH can't find the -s <full path of JAVA installation
requested page directory> /usr/bin/java
3. Ensure /usr/share/sgmgr-tomcat points to tomcat
installation directory, that is, catalina_home ( "ll
/usr/share/sgmgr-tomcat"). Otherwise, unlink
unlink /usr/share/sgmgr-tomcat and create
a new link ln -s <full path of Tomcat
installation directory>
/usr/share/sgmgr-tomcat
4. Ensure Serviceguard-manager-tomcat rpm is
installed ("serviceguard-manager-tomcat-01.00-0"
should be listed in execution of the command "rpm -qa
| grep serviceguard")
5. Run /opt/hp/hpsmh/tomcat/bin/tomcat_cfg
script
Move it back.
Fail one of the systems. For example, turn off the power on node 1. Make sure the package
starts up on node 2.
Repeat failover from node 2 back to node 1.
2. Be sure to test all combinations of application load during the testing. Repeat the failover
processes under different application states such as heavy user load versus no user load, batch
jobs versus online transactions, etc.
3. Record timelines of the amount of time spent during the failover for each application state. A
sample timeline might be 45 seconds to reconfigure the cluster, 15 seconds to run fsck on
the filesystems, 30 seconds to start the application and 3 minutes to recover the database.
===============================================================================
=============================================================================
Bus Type ______ Slot Number ____ Address ____ Disk Device File _________
Bus Type ______ Slot Number ___ Address ____ Disk Device File __________
Bus Type ______ Slot Number ___ Address ____ Disk Device File _________
Bus Type ______ Slot Number ___ Address ____ Disk Device File _________
============================================================================
Disk Power:
============================================================================
Tape Backup Power:
============================================================================
Other Power:
OR
==============================================================================
=============================================================================
PATH______________________________________________________________
VGCHANGE_________________________________
VG[0]__________________LV[0]______________________FS[0]____________________
VG[1]__________________LV[1]______________________FS[1]____________________
VG[2]__________________LV[2]______________________FS[2]____________________
NOTE: MD, RAIDTAB, and RAIDSTART are deprecated and should not be used. See Multipath
for Storage (page 82).
Anycast An address for a set of interfaces. In most cases these interfaces belong to different nodes. A
packet sent to an anycast address is delivered to one of these interfaces identified by the address.
Since the standards for using anycast addresses are still evolving, they are not supported in Linux
at present.
Multicast An address for a set of interfaces (typically belonging to different nodes). A packet sent to a
multicast address will be delivered to all interfaces identified by that address.
Unlike IPv4, IPv6 has no broadcast addresses; their functions are superseded by multicast.
Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a link. Interface
identifiers are required to be unique on that link. The link is generally identified by the subnet
prefix.
A unicast address is called an unspecified address if all the bits in the address are zero. Textually
it is represented as ::.
The unicast address ::1 or 0:0:0:0:0:0:0:1 is called the loopback address. It is used by a
node to send packets to itself.
Example:
::192.168.0.1
Example:
::ffff:192.168.0.1
where
FP = Format prefix. Value of this is 001 for Aggregatable Global unicast addresses.
TLA ID = Top-level Aggregation Identifier.
RES = Reserved for future use.
NLA ID = Next-Level Aggregation Identifier.
SLA ID = Site-Level Aggregation Identifier.
Interface ID = Interface Identifier.
1111111010 0 interface ID
Link-local address are supposed to be used for addressing nodes on a single link. Packets originating
from or destined to a link-local address will not be forwarded by a router.
Link-local address are supposed to be used within a site. Routers will not forward any packet with
site-local source or destination address outside the site.
TIP: To prevent an Out of Memory error reported by Tomcat (Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space), which may occur especially if the
server is under heavy load or Serviceguard Manager is managing a large cluster (4 nodes with
300 packages), do the following from the command line:
1. Stop hpsmhd
/etc/init.d/hpsmhd -stop
2. Modify the /opt/hp/hpsmh/tomcat/bin/startup.sh file, and add the following line
in the export statements section:
export CATALINA_OPTS="-Xms512m -Xmx512m"
3. Save the file and restart hpsmhd
/etc/init.d/hpsmhd -start
NOTE: If a cluster is not yet configured, you will not see the Serviceguard Cluster section on this
screen. To create a cluster, from the SMH Tools menu, click the Serviceguard Manager link in the
Serviceguard box first, then click Create Cluster.
The figure below shows a browser session at the HP Serviceguard Manager Main Page.
1 Cluster and Displays information about the Cluster status, alerts and general information.
Overall status
NOTE: The System Tools menu item is not available in this version of Serviceguard
and alerts
Manager.
2 Menu tool bar The menu tool bar is available from the HP Serviceguard Manager Homepage, and
from any cluster, node or package view-only property page. Menu option availability
depends on which type of property page (cluster, node or package) you are currently
viewing.
3 Tab bar The default Tab bar allows you to view additional cluster-related information. The Tab
bar displays different content when you click on a specific node or package.
4 Node Displays information about the Node status, alerts and general information.
information
5 Package Displays information about the Package status, alerts and general information.
information
NOTE: If you click on a cluster running an earlier Serviceguard release, the page will display a
link that will launch Serviceguard Manager A.05.01 (if installed) via Java Webstart.
301
302
G Monitoring Script for Generic Resources
Monitoring scripts are the scripts written by an end-user and must contain the core logic to monitor
a resource and set the status of a generic resource. These scripts are started as a part of the
package start.
You can set the status/value of a simple/extended resource respectively using the
cmsetresource(1m) command.
You can define the monitoring interval in the script.
The monitoring scripts can be launched within the Serviceguard environment by configuring
them as services, or outside of Serviceguard environment. It is recommended to launch the
monitoring scripts by configuring them as services.
For more information, see Launching Monitoring Scripts (page 303).
Template Scripts
HP provides a monitoring script template. The template provided by HP is:
generic_resource_monitor.template
This is located in the /usr/local/cmcluster/conf/examples/ directory.
See the template (page 305) to get an idea about how to write a monitoring script.
How to monitor a resource is at the discretion of an end-user and the script logic must be written
accordingly. HP does not suggest the content that goes into the monitoring script. However, the
following recommendations might be useful:
Choose the monitoring interval based on how quick the failures must be detected by the
application packages configured with a generic resource.
Get the status/value of a generic resource using cmgetresource before setting its
status/value.
Set the status/value only if it has changed.
See Getting and Setting the Status/Value of a Simple/Extended Generic Resource (page 111)
and the cmgetresource(1m) and cmsetresource(1m) manpages.
See Using the Generic Resources Monitoring Service (page 53).
service_name lan1_monitor
service_cmd $SGCONF/generic_resource_monitors/lan1.sh
service_name cpu_monitor
service_cmd $SGCONF/generic_resource_monitors/cpu_monitor.sh
The above example shows a sample multi-node package named generic_resource_monitors
and has two monitoring scripts configured one each to monitor a LAN and CPU. These monitoring
scripts will monitor the LAN interface, CPU and sets the status of the generic resources defined in
them accordingly.
Consider a package pkg1 having the LAN resource configured as before_package_start
and the monitoring script for this is running in the multi-node package
generic_resource_monitors. A dependency is created such that the multi-node package
must be UP in order to start the package pkg1. Once the multi-node package is started, the
monitoring of resource 'lan1' is started as part of the monitoring script 'lan1.sh'. The script will set
the status of the generic resource 'lan1' and once the is UP, the package pkg1 is eligible to be
started.
package_name pkg1
package_type failover
generic_resource_name lan1
generic_resource_evaluation_type before_package_start
dependency_name generic_resource_monitors
dependency_condition generic_resource_monitors = up
dependency_location same_node
Similarly, consider another package pkg2 that requires the 'CPU' to be configured as
before_package_start.
package_name pkg2
package_type failover
generic_resource_name cpu
generic_resource_name lan1
generic_resource_evaluation_type before_package_start
dependency_name generic_resource_monitors
dependency_condition generic_resource_monitors = up
dependency_location same_node
Thus, the monitoring scripts for all the generic resources of type before_package_start are
configured in one single multi-node package and any package that requires this generic resource
can just configure the generic resource name.
If a common resource has to be monitored in multiple packages, the monitoring scripts can be
configured in the multi-node package described above and multiple packages can define the same
generic resource name in their package configuration files as seen for the generic resource 'lan1'
in the above example.
Figure 30 depicts a multi-node package containing two monitoring scripts configured one to
monitor a lan and other to monitor a CPU. The two packages are configured with the generic
resource names and are dependent on the multi-node package.
Figure 30 Multi-node package configured with all the monitoring scripts for generic resources of
type before_package_start
# **********************************************************************
# * *
# * This script is a template that can be used as a service when *
# * creating a customer defined sample monitor script for *
# * generic resource(s). *
# * *
# * Once created, this script can be configured into the package *
# * configuration file as a service with the "service_name", *
# * "service_cmd" and "service_halt_timeout" parameters. *
# * Note that the respective "sg/service" and the *
# * "sg/generic_resource" modules need to be specified in the package *
# * configuraton file in order to configure these parameters. *
# * *
# * *
# * --------------------------------- *
# * U T I L I T Y F U N C T I O N S *
###########################################
# Initialize the variables & command paths
###########################################
#set the path for the command rm
<RM= PATH>
###########################
# Source utility functions.
###########################
if [[ -z $SG_UTILS ]]
then
. /etc/cmcluster.conf
SG_UTILS=$SGCONF/scripts/mscripts/utils.sh
fi
if [[ -f ${SG_UTILS} ]]
then
. ${SG_UTILS}
if (( $? != 0 ))
then
echo "ERROR: Unable to source package utility functions file: ${SG_UTILS}"
exit 1
fi
else
echo "ERROR: Unable to find package utility functions file: ${SG_UTILS}"
exit 1
fi
###########################################
# Source the package environment variables.
###########################################
#########################################################################
#
# start_command
#
# This function should define actions to take when the package starts
#
#########################################################################
function start_command
sg_log 5 "start_command"
return 0
}
#########################################################################
#
# stop_command
#
# This function should define actions to take when the package halts
#
#
#########################################################################
function stop_command
{
sg_log 5 "stop_command"
exit 1
}
################
# main routine
################
#########################################################################
#
# Customer defined monitor script should be doing following
# functionality.
#
# When the package is halting, cmhaltpkg will issue a SIGTERM signal to
# the service(s) configured in package. Use SIGTERM handler to stop
# the monitor script.
#
# Monitor the generic resource configured in package using customer
# defined tools and set the status or value to generic resource by using
# "cmsetresource" command. When setting the status or value get the current
# status or value using "cmgetresource" and set only if they are different.
#
#########################################################################
start_command $*
while [ 1 ]
do
309
310
Index
bridged net
A defined, 25
Access Control Policies, 158 broadcast storm
active node, 20 and possible TOC, 99
adding a package to a running cluster, 242 building a cluster
adding cluster nodes identifying heartbeat subnets, 157
advance planning, 132 identifying quorum server, 155
adding nodes to a running cluster, 212 logical volume infrastructure, 145
adding packages on a running cluster, 198 verifying the cluster configuration, 163
administration bus type
adding nodes to a running cluster, 212 hardware planning, 83
halting a package, 218
halting the entire cluster, 213 C
moving a package, 219 CAPACITY_NAME
of packages and services, 217 defined, 97
of the cluster, 211 CAPACITY_VALUE
reconfiguring a package while the cluster is running, definedr, 97
240 changes in cluster membership, 40
reconfiguring a package with the cluster offline, 241 changes to cluster allowed while the cluster is running,
reconfiguring the cluster, 228 228
removing nodes from operation in a running cluster, changes to packages allowed while the cluster is running,
212 243
responding to cluster events, 253 checkpoints, 274
reviewing configuration files, 262 client connections
starting a package, 217 restoring in applications, 278
troubleshooting, 260 cluster
adoptive node, 20 configuring with commands, 153
alter notification redundancy of components, 25
oracle and nfs toolkits environment, 197 Serviceguard, 19
serviceguard-xdc environment, 197 typical configuration, 19
applications understanding components, 25
automating, 271 cluster administration, 211
checklist of steps for integrating with Serviceguard, 283 solving problems, 263
handling failures, 279 cluster and package maintenance, 199
writing HA services for networks, 272 cluster configuration
ARP messages file on all nodes, 38
after switching, 70 identifying cluster-aware volume groups, 157
AUTO_START planning, 86
effect of default value, 76 planning worksheet, 104
AUTO_START_TIMEOUT verifying the cluster configuration, 163
parameter in the cluster configuration file, 100 cluster configuration file
AUTO_START_TIMEOUT (autostart delay) Autostart Delay parameter (AUTO_START_TIMEOUT),
parameter in cluster manager configuration, 100 100
automatic failback cluster coordinator
configuring with failover policies, 50 defined, 38
automatic restart of cluster, 39 cluster lock
automatically restarting the cluster, 213 4 or more nodes, 42
automating application operation, 271 and cluster reformation, example, 76
autostart delay and power supplies, 30
parameter in the cluster configuration file, 100 identifying in configuration file, 155
autostart for clusters no lock, 42
setting up, 165 two nodes, 40, 41
use in re-forming a cluster, 40, 41
B cluster manager
binding automatic restart of cluster, 39
in network applications, 277 blank planning worksheet, 289
311
cluster node parameter, 91, 92, 93 in package configuration, 236
defined, 38 pathname parameter in package configuration, 191
dynamic re-formation, 40 support for additional productss, 237
heartbeat subnet parameter, 95 troubleshooting, 262
initial configuration of the cluster, 38 controlling the speed of application failover, 272
main functions, 38 creating the package configuration, 233
maximum configured packages parameter, 104 customer defined functions
member timeout parameter, 99 adding to the control script, 237
monitored non-heartbeat subnet, 96
network polling interval parameter, 100, 104 D
planning the configuration, 90 data
quorum server parameter, 92 disks, 29
testing, 256 data congestion, 39
cluster node deciding when and where to run packages, 44
parameter in cluster manager configuration, 91, 92, deleting a package configuration
93 using cmdeleteconf, 242
cluster parameters deleting a package from a running cluster, 242
initial configuration, 38 deleting nodes while the cluster is running, 229
cluster re-formation deleting the cluster configuration
scenario, 75 using cmdeleteconf, 167
cluster startup dependencies
manual, 39 configuring, 113
cmapplyconf, 228, 239 designing applications to run on multiple systems, 275
cmapplyconf command, 196 disk
cmcheckconf, 163, 196, 238 data, 29
troubleshooting, 262 interfaces, 29
cmcheckconf command, 196 root, 29
cmcld daemon sample configurations, 30
and node reboot, 34 disk I/O
and node TOC, 34 hardware planning, 83
and safety timer, 34 disk layout
cmclnodelist bootstrap file, 136 planning, 85
cmdeleteconf disk logical units
deleting a package configuration, 242 hardware planning, 83
deleting the cluster configuration, 167 disk monitoring
cmmakepkg configuring, 198
examples, 192 disks
cmmodnet in Serviceguard, 29
assigning IP addresses in control scripts, 62 replacing, 257
cmnetassist daemon, 35 supported types in Serviceguard, 29
cmnetd daemon, 33 distributing the cluster and package configuration, 196,
cmquerycl 238
troubleshooting, 262 DNS services, 139
cmsnmpd daemon, 34 down time
configuration minimizing planned, 280
basic tasks and steps, 23 dynamic cluster re-formation, 40
cluster planning, 86
of the cluster, 38 E
package, 169 Easy deployment
package planning, 104 cmpreparecl, 86
service, 169 enclosure for disks
configuration file replacing a faulty mechanism, 257
for cluster manager, 38 error handling during package halt, 218
troubleshooting, 262 Ethernet
CONFIGURED_IO_TIMEOUT_EXTENSION redundant configuration, 26
defined, 101 exclusive access
configuring packages and their services, 169 relinquishing via TOC, 76
control script expanding the cluster
adding customer defined functions, 237 planning ahead, 79
312 Index
expansion H
planning for, 107 HALT_SCRIPT
explanations parameter in package configuration, 191
package parameters, 174 HALT_SCRIPT_TIMEOUT (halt script timeout)
parameter in package configuration, 191
F halting a cluster, 213
failback policy halting a package, 218
used by package manager, 50 halting the entire cluster, 213
FAILBACK_POLICY parameter handling application failures, 279
used by package manager, 50 hardware
failover monitoring, 256
controlling the speed in applications, 272 power supplies, 30
defined, 20 hardware failures
failover behavior response to, 76
in packages, 107 hardware planning
failover package, 43, 170 blank planning worksheet, 287
failover policy Disk I/O Bus Type, 83
used by package manager, 47 disk I/O information for shared disks, 83
FAILOVER_POLICY parameter host IP address, 82, 85
used by package manager, 47 host name, 81
failure I/O bus addresses, 83
kinds of responses, 75 I/O slot numbers, 83
network communication, 78 LAN interface name, 82, 85
response to hardware failures, 76 LAN traffic type, 82
responses to package and service failures, 77 memory capacity, 81
restarting a service after failure, 78 number of I/O slots, 81
failures planning the configuration, 81
of applications, 279 S800 series number, 81
FibreChannel, 29 SPU information, 81
figures subnet, 82, 85
mirrored disks connected for high availability, 30 worksheet, 83
redundant LANs, 27 heartbeat messages, 20
Serviceguard software components, 33 defined, 39
tasks in configuring an Serviceguard cluster, 23 heartbeat subnet address
typical cluster after failover, 21 parameter in cluster configuration, 95
typical cluster configuration, 20 HEARTBEAT_IP
file locking, 278 parameter in cluster configuration, 95
file system name parameter in package control script, 191 high availability, 19
file systems HA cluster defined, 25
planning, 85 objectives in planning, 79
floating IP address host IP address
defined, 62 hardware planning, 82, 85
floating IP addresses host name
in Serviceguard packages, 62 hardware planning, 81
FS, 191 HOSTNAME_ADDRESS_FAMILY
in sample package control script, 236 defined, 91
FS_MOUNT_OPT discussion and restrictions, 88
in sample package control script, 236 how the cluster manager works, 38
how the network manager works, 62
G
general planning, 79 I
Generic Resources I/O bus addresses
monitoring package resources with, 53 hardware planning, 83
monitoring script, 303 I/O slots
Generic resources hardware planning, 81, 83
sample monitoring script, 305 identifying cluster-aware volume groups, 157
generic resources monitoring service Installing Serviceguard, 135
using, 53 installing software
gethostbyname(), 276 quorum server, 145
313
integrating HA applications with Serviceguard, 283 planning, 85
introduction
Serviceguard at a glance, 19 M
understanding Serviceguard hardware, 25 MAC addresses, 276
understanding Serviceguard software, 33 managing the cluster and nodes, 211
IP manual cluster startup, 39
in sample package control script, 236 MAX_CONFIGURED_PACKAGES
IP address parameter in cluster manager configuration, 104
adding and deleting in packages, 63 maximum number of nodes, 25
for nodes and packages, 62 MEMBER_TIMEOUT
hardware planning, 82, 85 and safety timer, 34
portable, 62 configuring, 99
reviewing for packages, 260 defined, 98
switching, 45, 46, 70 maximum and minimum values , 98
IP_MONITOR membership change
defined, 102 reasons for, 40
iSCSI, 29 memory capacity
hardware planning, 81
J memory requirements
JFS, 273 lockable memory for Serviceguard, 79
minimizing planned down time, 280
K mirrored disks connected for high availability
kernel figure, 30
hang, and TOC, 75 monitor cluster with Serviceguard commands, 164
safety timer, 34 monitored non-heartbeat subnet
kernel consistency parameter in cluster configuration, 96
in cluster configuration, 140 monitored resource failure
kernel interrupts Serviceguard behavior, 25
and possible TOC, 99 monitoring disks, 198
monitoring hardware, 256
L Monitoring script
LAN launching, 303
heartbeat, 39 template, sample, 305
interface name, 82, 85 Monitoring Script for Generic Resources, 303
LAN failure moving a package, 219
Serviceguard behavior, 25 multi-node package, 44, 170
LAN interfaces multiple systems
primary and secondary, 25 designing applications for, 275
LAN planning
host IP address, 82, 85 N
traffic type, 82 name resolution services, 139
Launching Monitoring Scripts, 303 network
link-level addresses, 276 adding and deleting package IP addresses, 63
load sharing with IP addresses, 63 load sharing with IP addresses, 63
local switching, 63 local interface switching, 63
lock OTS/9000 support, 301
cluster locks and power supplies, 30 redundancy, 26
use of the cluster lock, 41 remote system switching, 69
use of the cluster lock disk, 40 network communication failure, 78
lock volume group, reconfiguring, 228 network components
logical volume parameter in package control script, 191 in Serviceguard, 25
logical volumes network manager
creating the infrastructure, 145 adding and deleting package IP addresses, 63
planning, 85 main functions, 62
LV, 191 network planning
in sample package control script, 236 subnet, 82, 85
LVM network polling interval (NETWORK_POLLING_INTERVAL)
commands for cluster use, 145 parameter in cluster manager configuration, 100, 104
disks, 29 network time protocol (NTP)
314 Index
for clusters, 140 planning, 104
networking run and halt script timeout parameters, 191
redundant subnets, 81 step by step, 169
networks subnet parameter, 191
binding to IP addresses, 277 using Serviceguard commands, 234
binding to port addresses, 277 verifying, 196
IP addresses and naming, 275 verifying the configuration, 196, 238
node and package IP addresses, 62 writing the package control script, 236
packages using IP addresses, 276 package configuration file, 174
supported types in Serviceguard, 25 editing, 193
writing network applications as HA services, 272 generating, 191
no cluster lock package dependency paramters, 180
choosing, 42 successor_halt_timeout, 178
node package configuration parameters, 174
basic concepts, 25 package control script
halt (TOC), 75 FS parameter, 191
in Serviceguard cluster, 19 LV parameter, 191
IP addresses, 62 package coordinator
timeout and TOC example, 76 defined, 39
node types package dependency
active, 20 parameters, 180
primary, 20 successor_halt_timeou, 178
NODE_FAIL_FAST_ENABLED package failover behavior, 107
effect of setting, 77 package failures
NODE_NAME responses, 77
parameter in cluster configuration, 93 package halt administration
parameter in cluster manager configuration, 91, 92, error handling, 218
93 package IP address
nodetypes defined, 62
primary, 20 package IP addresses
NTP defined, 62
time protocol for clusters, 140 reviewing, 260
package manager
O blank planning worksheet, 289, 290
OTS/9000 support, 301 testing, 255
outages package modules, 171
insulating users from, 271 base, 171
optional, 172
P package switching behavior
package changing, 220
adding and deleting package IP addresses, 63 packages
basic concepts, 25 deciding where and when to run, 44
blank planning worksheet, 289, 290 managed by cmcld, 34
changes allowed while the cluster is running, 243 parameter explanations, 174
error handling, 218 parameters, 174
halting, 218 types, 170
in Serviceguard cluster, 19 parameters
local interface switching, 63 for failover, 107
moving, 219 pacakge configuration, 174
reconfiguring while the cluster is running, 240 parameters for cluster manager
reconfiguring with the cluster offline, 241 initial configuration, 38
remote switching, 69 PATH, 191
starting, 217 physical volume
package administration, 217 for cluster lock, 40, 41
solving problems, 263 physical volumes
package and cluster maintenance, 199 blank planning worksheet, 288
package configuration planning, 85
applying, 196 planning
distributing the configuration file, 196, 238 cluster configuration, 86
315
cluster lock and cluster expansion, 85 redundancy in network interfaces, 25
cluster manager configuration, 90 redundant Ethernet configuration, 26
disk I/O information, 83 redundant LANS
for expansion, 107 figure, 27
hardware configuration, 81 redundant networks
high availability objectives, 79 for heartbeat, 20
overview, 79 relocatable IP address
package configuration, 104 defined, 62
power, 84 relocatable IP addresses
quorum server, 85 in Serviceguard packages, 62
SPU information, 81 remote switching, 69
volume groups and physical volumes, 85 removing nodes from operation in a running cluster, 212
worksheets, 83 removing packages on a running cluster, 198
planning and documenting an HA cluster, 79 removing Serviceguard from a system, 254
planning for cluster expansion, 79 replacing disks, 257
planning worksheets resources
blanks, 287 disks, 29
point of failure responses
in networking, 26 to cluster events, 253
POLLING_TARGET to package and service failures, 77
defined, 103 responses to failures, 75
ports responses to hardware failures, 76
dual and single aggregated, 65 restart
power planning automatic restart of cluster, 39
power sources, 84 following failure, 78
worksheet, 84, 288 restartable transactions, 273
power supplies restarting the cluster automatically, 213
blank planning worksheet, 287 restoring client connections in applications, 278
power supply rotating standby
and cluster lock, 30 configuring with failover policies, 47
UPS, 30 setting package policies, 47
primary LAN interfaces RUN_SCRIPT
defined, 25 parameter in package configuration, 191
primary node, 20 RUN_SCRIPT_TIMEOUT (run script timeout)
parameter in package configuration, 191
Q running cluster
QS_ADDR adding or removing packages, 198
parameter in cluster manager configuration, 92
quorum S
and cluster reformation, 75 S800 series number
quorum server hardware planning, 81
and safety timer, 34 safety timer
installing, 145 and node TOC, 34
parameters in cluster manager configuration, 92 and syslog, 34
planning, 85 duration, 34
sample disk configurations, 30
R Sample monitoring script for generic resources, 305
re-formation service administration, 217
of cluster, 40 service configuration
reconfiguring a package step by step, 169
while the cluster is running, 240 service failures
reconfiguring a package with the cluster offline, 241 responses, 77
reconfiguring a running cluster, 228 service restarts, 78
reconfiguring the entire cluster, 228 SERVICE_CMD
reconfiguring the lock volume group, 228 in sample package control script, 236
recovery time, 86 SERVICE_FAIL_FAST_ENABLED
redundancy and node TOC, 77
in networking, 26 SERVICE_NAME
of cluster components, 25 in sample package control script, 236
316 Index
SERVICE_RESTART supported disks in Serviceguard, 29
in sample package control script, 236 supported networks in Serviceguard, 25
Serviceguard switching
install, 135 ARP messages after switching, 70
introduction, 19 local interface switching, 63
Serviceguard at a Glance, 19 remote system switching, 69
Serviceguard behavior switching IP addresses, 45, 46, 70
in LAN failure, 25 system log, 257
in monitored resource failure, 25 system log file
in software failure, 25 troubleshooting, 261
Serviceguard commands system message
to configure a package, 234 changing for clusters, 166
Serviceguard Manager, 22 system multi-node package, 43, 170
overview, 22
Serviceguard software components T
figure, 33 tasks in Serviceguard configuration
serviceguard WBEM provider, 37 figure, 23
shared disks testing
planning, 83 cluster manager, 256
shutdown and startup package manager, 255
defined for applications, 272 testing cluster operation, 255
single point of failure time protocol (NTP)
avoiding, 19 for clusters, 140
single-node operation, 166, 253 TOC
size of cluster and package availability, 76
preparing for changes, 132 and safety timer, 99
SMN package, 43 and the safety timer, 34
SNA applications, 278 when a node fails, 75
software failure traffic type
Serviceguard behavior, 25 LAN hardware planning, 82
software planning troubleshooting
LVM, 85 approaches, 260
solving problems, 263 monitoring hardware, 256
SPU information replacing disks, 257
planning, 81 reviewing control scripts, 262
standby LAN interfaces reviewing package IP addresses, 260
defined, 25 reviewing system log file, 261
starting a package, 217 using cmquerycl and cmcheckconf, 262
startup and shutdown troubleshooting your cluster, 255
defined for applications, 272 typical cluster after failover
startup of cluster figure, 21
manual, 39 typical cluster configuration
stationary IP addresses, 62 figure, 20
STATIONARY_IP
parameter in cluster configuration, 96 U
status uname(2), 277
cmviewcl, 199 understanding network components in Serviceguard, 25
package IP address, 260 UPS
system log file, 261 in power planning, 84
stopping a cluster, 213 power supply, 30
SUBNET use of the cluster lock, 40, 41
in sample package control script, 236
parameter in package configuration, 191 V
subnet verifying cluster configuration, 163
hardware planning, 82, 85 verifying the cluster and package configuration, 196, 238
parameter in package configuration, 191 VG
SUBNET (for IP Monitor) in sample package control script, 236
defined, 102 vgcfgbackup
successor_halt_timeout parameter, 178 using to back up volume group configuration, 152
317
VGCHANGE
in package control script, 236
VGChange, 191
volume group
for cluster lock, 40, 41
planning, 85
volume group and physical volume planning, 85
W
WEIGHT_DEFAULT
defined, 103
WEIGHT_NAME
defined, 103
What is Serviceguard?, 19
worksheet
blanks, 287
cluster configuration, 104, 289
hardware configuration, 83, 287
package configuration, 289, 290
power supply configuration, 84, 287, 288
use in planning, 79
318 Index