HA Admin Tasks
HA Admin Tasks
HA Admin Tasks
for HA Administrators
Session ID: 41CO
Michael Herrera
PowerHA SystemMirror (HACMP) for AIX
ATS Certified IT Specialist
[email protected]
Agenda
Also useful:
# lssrc –ls clstrmgrES | grep fix
cluster fix level is "3"
Attention:
Be aware that HA 7.1.1 SP2 or SP3 does not get reported back properly. The halevel command
probes with the wrong option and since the “server.rte” fileset is not updated it will not catch the
updates to the cluster.cspoc.rte filesets.
Upgrade Considerations
There are two main areas that you need to consider – OS & HA Software
Change Controls: what is your ability to apply and test the updates ?
Consider things like Interim Fixes locking down the system
– Will they need to be reapplied?
– Will they need to be rebuilt?
You can start the upgrade on either node - Stop Cluster Services
but obviously an update to the node - OS update TL1 & SPs
hosting the application would cause a - Reboot
disruption to operations
Reintegrate into cluster with AIX 7.1.1.5
Common Question: Can the cluster run with the nodes running different levels?
- UNMANAGE resources
- Application is still running
- smit update_all We advise against stopping the
- HA Level & Patches cluster with the UNMANAGE option
- Be mindful of new base filesets on more than one node at a time.
- smit clstart Note that it can be done but there
- Start scripts will get reinvoked are various factors to consider
- UNMANAGE resources
- smit update_all
- smit clstart
Common Question: How long can the cluster run in a mixed mode ? What operations are supported ?
6 © 2010 IBM Corporation
IBM Power Systems
Scenario:
- Client had an environment running independent Oracle databases in a mutual takeover cluster
configuration. They wanted to update the Oracle binaries one node at a time and wanted to avoid
an unexpected fallover during the process. They wished to UNMANAGE cluster resources on all
nodes at the same time.
Lessons Learned:
Do not do an upgrade of the cluster filesets while unmanaged on all nodes
– This would recycle the clstrmgrES daemon and the cluster would lose its internal state
Application monitors are not suspended when you UNMANAGE the resources
– If you manually stop the application and forget about the monitors existing application
monitors could auto-restart it or initiate a takeover depending on your configuration
Application Start scripts will get invoked again on restart of cluster services
– Be aware of what happens when you invoke your start script while already running, or
comment out the scripts prior to restarting cluster services
Application
Monitors will
continue to run.
Depending on the
implementation it
might be wise to
suspend monitors
prior to this
operation
smitty cl_admin
Most of this is old news, but the use of dependencies can affect where and how
the resources get acquired. More importantly it can affect the steps required to
move resource groups and more familiarity with the configuration is required
Be aware of the
clcomd changes for
version 7 clusters
The clutils.log file should show the results of the nightly check
Custom Verification Methods may be defined to run during the Verify / Sync operations
Note: Automatic verify & sync on node start up does not include any custom verification methods
NODE mutiny.dfw.ibm.com
PACKAGE INSTALLER LABEL
======================================================== =========== ==========
bos.rte.security installp passwdLock
NODE munited.dfw.ibm.com
PACKAGE INSTALLER LABEL
======================================================== =========== ==========
bos.rte.security installp passwdLock
HACMPadapter cllsif
The snapshot upgrade
TinfoT. T.. migration path requires
the entire cluster to be
down
* This is restriction currently under evaluation by the CAA development team and may
be lifted in a future update
– ksh restrictions were removed to allow the use of a “-” in service IP labels so
both V6.1 and V7.X support their use in the name
Common Questions:
– Will the number of disks or volume groups affect my fallover time?
– Should I configure less larger luns or more smaller luns?
Versions 6.1 and earlier allowed Standard VGs or Enhanced Concurrent VGs
– Version 7.X require the use of ECM volume groups
Your Answers:
Standard VGs would require an openx call against each physical volume
– Processing could take several seconds to minutes depending on the number of LUNs
ECM VGs are varied on all nodes (ACTIVE / PASSIVE)
– It takes seconds per VG
Best Practice:
Always try to keep it simple, but stay current with new features and take advantage
of existing functionality to avoid added manual customization.
28
* Be mindful of this with the implementation of Pre/Post Events © 2010 IBM Corporation
IBM Power Systems
Configuration_Files SystemMirror_Files
– /etc/hosts – Pre, Post & Notification
– /etc/services – Start & Stop scripts
– /etc/snmpd.conf – Scripts specified in monitors
– /etc/snmpdv3.conf – Custom pager text messages
– /etc/rc.net – SNA scripts
– /etc/inetd.conf – Scripts for tape support
– /usr/es/sbin/cluster/netmon.cf – Custom snapshot methods
– /usr/es/sbin/cluster/etc/clhosts – User defined events
– /usr/es/sbin/cluster/etc/rhosts
– /usr/es/sbin/cluster/etc/clinfo.rc
Node A Node B
/usr/local/hascripts/app* /usr/local/hascripts/app*
#!/bin/ksh #!/bin/ksh
Application Start Logic Application Start Logic
Can select
– Local (files)
– LDAP
Select Nodes by
Resource Group
– No selection
means all nodes
Users will be
propagated to all of
the cluster nodes
applicable
Password command
can be altered to
ensure consistency
across al nodes
Optional List of
Users whose
passwords will be
propagated to all
cluster nodes
– passwd
command is
aliased to
clpasswd
Functionality
available since
HACMP 5.2
(Fall 2004)
Sample Email:
# cat /usr/es/sbin/cluster/samples/pager/sample.txt
Node %n: Event %e occurred at %d, object = %o
Sample Email:
Attention:
Sendmail must be working and accessible via the firewall to receive notifications
37 © 2010 IBM Corporation
IBM Power Systems
There is a push to
leverage IBM Systems
Director which will guide
you through the step by
step configuration of the
cluster
The cluster is easy to set up, but what about changes going forward
Attributes stored
in HACMPcluster
object class
Grace period is the waiting time period after Grace period is the waiting time period after
detecting the Failure before it is reported. detecting the Failure before it is reported.
Only
invoked on 60 sec 60 sec
application interval interval
startup
Confirm the
startup of the Long Running Monitors will
application continue run locally with the
running application
New
Application
Startup Mode Checks the Invokes the
in HA 7.1.1 process table custom logic
Resource Group A
Service IP
Enhancement was introduced in HA Version 7.1.1
Volume Group - Application start may be set to run in the foreground
/filesystems
start.sh
Application
Controller
stop.sh
Start up Monitor
Long-Running Monitor
There was no
SDMC support.
No longer much
of an issue
Information
stored in HA
ODM object
classes
Multiple HMC
IPs may be
defined
separated by a
space
Food for Thought: How many DLPAR operations can be handled at once?
# clmgr online cluster WHEN=now MANAGE=auto BROADCAST=true CLINFO=true Start Cluster Services
Summary
There are some notable differences between V7 and HA 6.1 and earlier
– Pay careful attention to where some of the options are available
– Appended Summary Chart of new features to the presentation
SG24-8030
50 © 2010 IBM Corporation
IBM Power Systems
Summary Chart
New Functionality & Changes
– New CAA Infrastructure 7.1.X Smart Assistants (Application Integration)
• IP Multicast based Heartbeat Protocol – SAP Live Cache with DS or SVC 7.1.1
• HBA Based SAN Heartbeating – MQ Series 7.1.1
• Private Network Support
• Tunable Failure Detection Rate
• New Service IP Distribution Policies DR Capabilities
• Full IPV6 Support 7.1.2 – Stretch & Linked Clusters 7.1.2
– DS8000 Hyperswap 7.1.2
– Disk Fencing Enhancements 7.1.0
– Rootvg System Event 7.1.0
– Disk rename Function 7.1.0 Management
– Repository Disk Resilience 7.1.1 – New Command Line Interface 7.1.0
• Backup Repository Disks 7.1.2 • clcmd
• clmgr utility
– New Application Startup Mode 7.1.1 • lscluster
– Exploitation of JFS2 Mount Guard 7.1.1 – IBM Systems Director Management 7.1.0
– Adaptive Fallover 7.1.0
– New RG Dependencies 7.1.0
• Start After, Stop After
Extended Distance Clusters
– Federated Security 7.1.1 – XIV Replication Integration (12/16/2011)
• RBAC, EFS & Security System Administration – XP12000, XP24000 (11/18/2011)
– HP9500 (8/19/2011)
– Storwize v7000 (9/30/2011)
– SVC 6.2 (9/30/2011)
Questions?
Additional Resources
RedGuide: High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi
server IBM Power Systems Environments
http://www.redbooks.ibm.com/abstracts/redp4669.html?Open