Ha PDF
Ha PDF
Ha PDF
Systems
Version 1.5
Deployment Guide
IBM
Note
Before using this information and the product it supports, read the information in “Notices” on page
95.
This is the latest edition for IBM® VM Recovery Manager HA Version 1.5 for Power Systems until otherwise indicated in a
newer edition.
© Copyright International Business Machines Corporation 2020, 2021.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
Contents
FAQ....................................................................................................................... 1
Overview...............................................................................................................5
Concepts............................................................................................................... 7
Planning................................................................................................................9
Requirements.............................................................................................................................................10
Limitations..................................................................................................................................................14
Installing.............................................................................................................19
Upgrading................................................................................................................................................... 23
Uninstalling................................................................................................................................................ 24
Configuring..........................................................................................................27
Setting up the KSYS subsystem.................................................................................................................27
Setting up HA policies................................................................................................................................33
Modifying the KSYS configuration............................................................................................................. 37
Setting up the VM agent.............................................................................................................................38
Recovering hosts, VMs, and applications..................................................................................................40
Configuring the sudo command ................................................................................................................42
VM agents...................................................................................................................................................44
Setting up KSYS high availability through PowerHA SystemMirror.......................................................... 49
Local database mode.................................................................................................................................51
Commands.......................................................................................................... 53
ksysmgr command.....................................................................................................................................53
ksysvmmgr command................................................................................................................................65
Troubleshooting...................................................................................................77
Log files and trace files.............................................................................................................................. 77
Error notification for the KSYS events....................................................................................................... 79
Solving common problems........................................................................................................................ 80
Collecting diagnostic data to contact IBM Support.................................................................................. 93
Notices................................................................................................................95
Privacy policy considerations.................................................................................................................... 96
Trademarks................................................................................................................................................ 97
iii
iv
About this document
The VM Recovery Manager HA solution is a set of software components that together provide a high
availability mechanism for virtual machines running on POWER7® processor-based server, or later. This
document describes various components, subsystems, and tasks that are associated with the VM
Recovery Manager HA solution.
This information provides system administrators with complete information about the following sections:
• Concepts that are used in the VM Recovery Manager HA solution.
• Planning the VM Recovery Manager HA implementation in your production environment and the
minimum software requirements.
• Installing the VM Recovery Manager HA filesets.
• Configuring your environment to use the VM Recovery Manager HA solution.
• Troubleshooting any issues associated with the VM Recovery Manager HA solution.
• Using the VM Recovery Manager HA commands.
Highlighting
The following highlighting conventions are used in this document:
Bold Identifies commands, subroutines, keywords, files, structures, directories, and other
items whose names are predefined by the system. Bold highlighting also identifies
graphical objects, such as buttons, labels, and icons that you select.
Italics Identifies parameters for actual names or values that you supply.
Identifies examples of specific data values, examples of text similar to what you
Monospace
might see displayed, examples of portions of program code similar to what you might
write as a programmer, messages from the system, or text that you must type.
Incorrect example
ISO 9000
ISO 9000 registered quality systems were used in the development and manufacturing of this product.
2 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Do I need to perform LPM only through VM Recovery Manager HA or can I do it through the HMC,
PowerVC, or LPM tool?
You can use the VM Recovery Manager HA solution to perform planned HA activities by using the LPM
operation. However, you can use any other tool to do LPM that works best for you. If you are using
another tool, ensure that the VM moves to another host that is within the host group.
Can VM Recovery Manager HA co-exist with other solutions such as PowerVC and PowerHA®
SystemMirror?
Yes. For more information, see “Coexistence with other products” on page 13.
Does VM Recovery Manager HA support NovaLink?
Not directly. However, in PowerVM NovaLink-based environments that also have HMCs, you can
register scripts that can be plugged into VM Recovery Manager HA. The sample scripts can be
customized based on your environment. These scripts change the HMC settings to be in master mode
for brief periods so that the KSYS subsystem can work with the HMC to monitor the environment for
high availability.
Which storage systems does VM Recovery Manager HA support?
The VM Recovery Manager HA solution can support any storage systems that are certified with the
VIOS, except internet Small Computer Systems Interface (iSCSI) storage devices. Storage disks that
are related to VMs must be accessible across all the hosts within the host group so that VM can move
from a host to any other host within the host group.
Can I use VM Recovery Manager HA with SVC HyperSwap®?
SAN Volume Controller (SVC) and Storwize® HyperSwap technology perform hidden synchronous
mirroring across short distances. The VM Recovery Manager HA solution can be used across that
distance if the storage system fulfills all the requirements of shared storage and is certified with the
VIOS.
Which types of network do VM Recovery Manager HA support?
The VM Recovery Manager HA solution can support any network that is certified with the VIOS that
supports the Live Partition Mobility operation.
Which types of VIOS storage configuration do VM Recovery Manager HA support?
The VM Recovery Manager HA solution supports virtual SCSI (vSCSI), N_Port ID virtualization (NPIV),
Shared Storage Pool (SSP), and any storage configuration that is supported for the LPM operation.
What all can I manage from the VM Recovery Manager HA GUI?
The VM Recovery Manager HA GUI offers deployment, health monitoring, and administrative
experiences.
Does VM Recovery Manager HA relocate a VM if I shut down or stop the VM manually?
No. The VM Recovery Manager HA solution checks the HMC and firmware resources to verify whether
the lack of heartbeats from a VM is caused because of an administrator-initiated operation. In those
cases, VM will not be relocated.
How is fencing performed during failures?
The VM Recovery Manager HA solution stops the failed VM through the HMC before starting the VM on
another host. If the KSYS subsystem cannot connect to the HMC or if the HMC cannot stop the VM
successfully, the KSYS subsystem cancels the automated restart operation and instead sends a
critical event to the administrator about the VM failure.
Can I configure the alerts in KSYS to be sent as text messages instead of emails?
Yes. Contact your phone company to obtain the phone number that can be represented as an email
address, and register the updated email address with the KSYS subsystem. For more information, see
Setting contacts for event notification.
Can I register our own method or script to be invoked when specific events occur?
Yes. Use the ksysmgr command to register scripts to be called by the KSYS subsystem when specific
events occur. For more information, see “Event notification script management” on page 61.
How do I save the KSYS configuration?
Use the ksysmgr command to save the KSYS configuration as a snapshot that can be restored later.
For more information, see Backing up the configuration data.
4 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
VM Recovery Manager HA overview
High availability (HA) management is a critical feature of business continuity plans. Any downtime to the
software stack can result in loss of revenues and disruption of services. IBM VM Recovery Manager HA for
Power Systems is a high availability solution that is easy to deploy and provides an automated solution to
recover the virtual machines (VMs), also known as logical partitions (LPARs).
The VM Recovery Manager HA solution implements recovery of the virtual machines based on the VM
restart technology. The VM restart technology relies on an out-of-band monitoring and management
component that restarts the VMs on another server when the host infrastructure fails. The VM restart
technology is different from the conventional cluster-based technology that deploys redundant hardware
and software components for a near real-time failover operation when a component fails.
The VM Recovery Manager HA solution is ideal to ensure high availability for many VMs. Additionally, the
VM Recovery Manager HA solution is easier to manage because it does not have clustering complexities.
The following figure shows the architecture of the VM Recovery Manager HA solution. A set of hosts is
grouped to be backup for each other. When failures are detected, VMs are relocated and restarted on
other healthy hosts within the group.
Controller system
(KSYS)
HMC
vmrmha002
Host 1 Host 2 Host 3
Host group 1
6 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
VM Recovery Manager HA concepts
The VM Recovery Manager HA solution provides a highly available environment by identifying a set of
resources that are required for processing virtual machines in a server.
The VM Recovery Manager HA solution uses the following concepts:
Controller system (KSYS)
The controlling system, also called KSYS, is a fundamental component that monitors the production
environment for any unplanned outage. If an unplanned outage occurs, the KSYS analyzes the
situation, notifies the administrator about the failure, and can automatically move the failed virtual
machines to another host in the host group. The KSYS interacts with the Hardware Management
Console (HMC) to collect configuration information of managed systems. The KSYS subsystem also
collects VIOS health information through the HMC.
The KSYS subsystem runs in an AIX logical partition (LPAR). You can customize the security level for
the KSYS LPAR according to the AIX security requirements of your organization. In addition, the KSYS
LPAR can be protected for failure by using other products such as PowerHA SystemMirror® for AIX.
The KSYS subsystem must remain operational even if the site fails. Ensure that you periodically
receive KSYS health reports. You can also check the KSYS subsystem health in the VM Recovery
Manager HA GUI dashboard.
Host group
Hosts are grouped together to be backup for each other. When failures in any of the hosts are
detected, VMs in the failed host are relocated and restarted on other healthy hosts within the group of
hosts. This group of hosts is called a host group.
Host monitor
The host monitor daemon is shipped with the Virtual I/O Server (VIOS) and is deployed during the
VIOS installation. When you initialize the KSYS subsystem for high-availability feature, the host
monitor module becomes active. The KSYS subsystem communicates with the host monitor daemon
through the HMC to monitor the hosts for high availability. For information about the VIOS version that
contains the host monitor daemon, see the Requirements section.
VM agent
You can optionally install the VM agent filesets, which are shipped along with the KSYS filesets, in the
guest virtual machines. The VM agent subsystem provides high-availability feature at the VM and
application level. The VM agent monitors the following issues in the production environment:
• VM failures: If the operating system of a VM is not working correctly, or if the VM has stopped
working because of an error, the VM is restarted on another host within the host group. The KSYS
subsystem uses the VM monitor module to monitor the heartbeat from the VM to the host monitor
subsystem in a VIOS.
• Application failures: Optionally, you can register the applications in the VM agent to enable
application monitoring. The VM agent uses the Application HA monitoring framework to monitor the
health of the application periodically by running the application-specific monitor scripts, by
identifying whether the application has failed, and by identifying whether the VM must be restarted
in the same host or another host. This framework can also manage the sequence in which
applications are started and stopped within a VM.
Note: The VM agent is supported on AIX and Linux (RHEL and SLES) guest VMs only. Currently, the VM
agent subsystem is not supported for the IBM i and Ubuntu VMs. Therefore, IBM i and Ubuntu VMs are
relocated from one host to another host within the host group only after a host failure.
The following figure shows the detailed architecture of the VM Recovery Manager HA solution:
HMC
VM VM VM
Application agent agent agent
HA monitoring
framework VLAN VLAN
VM monitor Heartbeat
Host Host
vmrmha003
VIOS1 VIOS2
monitor monitor
VM agent Health check across all hosts
Host 1 Host 2
Disks
8 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Planning VM Recovery Manager HA
To implement the VM Recovery Manager HA solution, you must review your current high availability (HA)
recovery plan and consider how the VM Recovery Manager HA solution can be integrated into your current
environment.
The VM Recovery Manager HA package consists of filesets for the installation of KSYS, GUI, and VM agent.
The following table describes the key components of the VM Recovery Manager HA solution:
Software requirements
• The KSYS logical partition must be running IBM AIX 7.2 with Technology Level 2
• You must install the OpenSSL software version 1.0.2.800, or later for the AIX operating system. The
latest version of the OpenSSL software is also included on the AIX base media.
• Each LPAR in the host must have one of the following operating systems:
– AIX Version 6.1, or later
– PowerLinux
- Red Hat Enterprise Linux (little endian) Version 7.4, or later (kernel version: 3.10.0-693)
- SUSE Linux Enterprise Server (little endian) Version 12.3, or later (kernel version - 4.4.126-94.22)
- Ubuntu Linux distribution Version 16.04
– IBM i Version 7.1, or later
• You can install the VM agent to monitor the virtual machine and applications on the LPARs that run only
the following operating systems:
– AIX Version 6.1, or later
– PowerLinux
- Red Hat Enterprise Linux (little endian) Version 7.4, or later (kernel version: 3.10.0-693)
- SUSE Linux Enterprise Server Version 12.3, or later (kernel version - 4.4.126-94.22)
• This release requires the IJ29125m0a.201110.epkg.Z KSYS efix. You can download and install the
efix from the following location:
https://aix.software.ibm.com/aix/efixes/IJ29125m/IJ29125m0a.201110.epkg.Z
• For VIOS version 3.1.1.0, the following VIOS efix is required. You must download the efix and install it
before installing the VM Recovery Manager HA for Power Systems Version 1.5:
https://aix.software.ibm.com/aix/ifixes/IJ21043/IJ21043m1b.200218.epkg.Z
• For VIOS versions 3.1.1.21 and 3.1.1.25, the following VIOS efix is required. You must download the
efix and install it before installing the VM Recovery Manager HA for Power Systems Version 1.5:
https://aix.software.ibm.com/aix/efixes/ij25165/IJ25165m2c.200727.epkg.Z
• For VIOS version 3.1.2.10, the following VIOS efix is required. You must download the efix and install it
before installing the VM Recovery Manager HA for Power Systems Version 1.5:
https://aix.software.ibm.com/aix/efixes/IJ28933/IJ28933m1a.201106.epkg.Z
10 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• PowerVM® Enterprise Edition must be deployed on all hosts to use the high-availability feature.
• VIOS Version 3.1.1.10, or later, with all the subsequent patches must be installed in VIOS partitions.
Also, your production environment must have two Virtual I/O Servers per host. You can have a
maximum of 24 Virtual I/O Servers in a single host group. If more than two Virtual I/O Servers exist in a
host, you can exclude it from the KSYS configuration settings. For more information about setting up
dual VIOS in your environment, see Setting up a dual VIOS by using the HMC.
• As a best practice, you must deploy AIX rules in the VIOS. The VIOS must have enough free space in
the /(root), /var, and /usr file system. Additional CPU and memory resources are needed in each
VIOS for VM Recovery Manager HA management. You must add at least 0.5 core CPU and 2 GB memory
apart from the VIOS sizing that you are planning to deploy for your production environment.
Firmware requirements
• Minimum required levels of IBM Power Systems servers follow:
– POWER7+™ Systems that have one of the following firmware levels:
- FW770.90, or later
- FW780.70, or later except MMB systems (9117-MMB models)
- FW783.50, or later
– POWER8® Systems that have one of the following firmware levels:
- FW840.60, or later
- FW860.30, or later
– POWER9™ Systems that have the following firmware levels:
- FW910, or later
Network requirements
• All virtual machines (VMs) that are managed by the VM Recovery Manager HA solution must use virtual
I/O resources through VIOS. The VMs must not be connected to a physical network adapter or any
dedicated devices.
• Storage area network (SAN) connectivity and zoning must be configured so that VIOS can access the
disks that are relevant to the hosts.
• Ensure independent redundant SAN and network connections are established across the VIOS in each
hosts in the host group.
• Ensure that the KSYS LPAR has HTTP Secure (HTTPS) connectivity to all the HMCs that can manage the
hosts in the host group.
• The same virtual LAN (VLAN) must be configured across the site.
• Ensure redundant connections are established from the KSYS LPAR to HMC and from HMC to VIOS
logical partitions. Any connectivity issues between KSYS, HMC, and VIOS logical partitions can lead to
disruption in the regular data collection activity and disaster recovery operations.
• Ensure proper RMC connection between the VMs and HMC. If RMC connection between VM and HMC
has issues, the Partition Load Manager (PLM) cannot work and hence the VMs cannot be recovered.
GUI requirements
• The logical partition (LPAR), in which you want to install the GUI filesets, must be running IBM AIX 7.2
with Technology Level 2 Service Pack 1 (7200-02-01), or later. You can choose to install the GUI server
fileset on one of the KSYS nodes.
• The LPAR in which you are installing the GUI server must run in Enhanced Korn shell that uses
the /usr/bin/ksh93 shell script.
12 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• The LPAR in which you are installing the GUI server fileset must have at least 1 core CPU and 8 GB
memory.
• Google Chrome and Mozilla Firefox web browsers are supported to access the GUI for the VM Recovery
Manager HA solution.
Licensing considerations
• With the VM Recovery Manager HA solution, the virtual machines (VMs) that must be replicated or
managed by the solution are hosted by processor cores. These managed VMs do not determine the
licensing count, but the count of processor cores that are hosting the managed VMs determines the
licensing cost. This means that the count of whole number of processor cores that are hosting the VMs
that are being replicated or managed by the VM Recovery Manager HA solution determines the license
count.
• The VM Recovery Manager HA licenses are installed on an AIX partition that is designated to be the
partition that is hosting the KSYS orchestrator. The VM Recovery Manager HA license enables the KSYS
orchestrator
KSYS limitations
• If a user-defined Shared Storage Pool (SSP) cluster name is same as the KSYS subsytem defined cluster
name, or is in the KSYS_<KSYS_CLUSTER_NAME>_1_<siteID> format, the user-defined SSP cluster
name and the KSYS subsystem defined cluster are considered to be the same. For example, if both the
cluster names are the same, the KSYS subsystem removes the user-defined SSP cluster automatically
while removing the KSYS defined cluster.
• The following commands can run without considering the policies:
Therefore, successful completion of the verification and validation operations does not mean that the
virtual machines can be relocated successfully.
• In user-defined cluster, the following commands might remove all the data that you added:
• The KSYS subsystem follows the KSYS_peer domain_HG_ID format to name an HA SSP. The KSYS
subsystem uses this format to differentiate between an HA SSP and a user-defined SSP. Therefore, you
must not use this format for all user-defined SSPs.
• If access to the repository disk and the pool disk across the cluster or across some of the Virtual I/O
Servers in the SSP cluster is lost, the status reporting operations and failover operations might be
delayed. The discovery operation might also fail. Contact IBM Support to check whether any fixes are
available for these issues.
• When the database network (DBN) node lose network connective or lose access to the pool of disks for
a long time, all Virtual I/O Servers operate in local mode.
• When you remove the KSYS cluster, the KSYS subsystem fails to delete HA-specific VM and VIOS
adapters if the cleanup operation continues for a long time. You must delete the VIOS adapters
manually to avoid inconsistencies across the Virtual I/O Servers. If you create the KSYS cluster again,
the KSYS subsystem can reuse the previous HA-specific adapters.
• The KSYS subsystem supports the Shared Storage Pool (SSP) cluster's high availability disk only when
the Shared Storage Pool (SSP) is created from the KSYS subsystem.
• The KSYS subsystem does not display the high availability disk in any query when you use a user-
defined SSP cluster.
• You cannot modify a KSYS subsystem's high availability disk after creating the SSP cluster from the
KSYS node.
• After configuring a KSYS cluster and after configuring applications on that cluster, if you shut down a
virtual machine (VM) or a logical partition (LPAR) of the cluster, the KSYS subsystem does not change
the status of the VM or LPAR to red. The status remains green. However, if you shut down the same VM
or LPAR from the HMC, the KSYS subsystem changes the status of the VM or LPAR to red.
14 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• A maximum of 10 scripts can be added in the KSYS sub system for add notify command.
• The ksysmgr command fails to start or stop applications that are part of a dependency setup if Virtual
I/O Servers are in local database mode.
– Workaround: You must run the resume or suspend command for the VM.
• On VIOS nodes, if the disks of a shared storage pool (SSP) are not accessible after the system is re-
activated due to shutdown or reboot, the disk state continues to be down. This impacts the start of pool
and requires a quorum to come back online. As a workaround, choose one of the following options. If
you do not want to reboot your VIOS, follow the workaround option 1.
– Workaround option 1: Complete the following procedure:
1. Restore the disk connectivity.
2. Run the cfgmgr command as a root user to make the system aware of the disks.
3. Run the command padmin: clstartstop -stop -m <node>.
4. Run the command padmin: clstartstop -start -m <node>.
– Workaround option 2: Complete the following procedure:
1. Restore the disk connectivity.
2. Reboot the VIOS node.
• For a VM with vSCSI disk, the cleanup operation fails in the local database mode.
– Workaround: You must bring the SSP cluster back to the global mode.
• The KSYS subsystem does not handle the application dependency, if the VM has been shut down
manually and the dependent application is part of the VM.
• VM Recovery Manager HA does not work if the Live Partition Mobility (LPM) features is disabled at
firmware level.
• If a current repository disk is down, automatic replacement does not occur on previously used
repository disk that has the same cluster signature. In this case, a free backup repository disk might not
be available, hence the automatic replacement operation fails.
– Workaround: Run the following command to clear the previous cluster signatures:
cleandisk -r <diskname>
• The ksysmgr -t remove cec command is not supported on a user-defined KSYS cluster.
– Workaround: Reconfigure the KSYS cluster. Otherwise, use the KSYS controlled VIOS cluster.
• In the scalability environment where the VMs are spread across the hosts of a host group, and the
LPM verification operation is run on the host group, based on the type of configuration, at some point of
time, many requests might go to one host and if the number of requests are more than the maximum
requests that the host can handle, the verification operation might fail with following error:
HSCLB401 The maximum number of partition migration commands allowed are already in progress.
Migration limitations
• Disable the quick discovery feature before running the Live Partition Mobility (LPM) and the restart
operations on a virtual machines.
• You cannot run the Live Partition Mobility (LPM) operation simultaneously on multiple hosts by using the
ksysmgr command. You must specify multiple virtual machines, in a comma-separated list, in the
VM agent limitations
• The ksysvmmgr start|stop app command supports only one application at a time.
• The ksysvmmgr suspend|resume command does not support the application dependencies.
• For all applications that are installed on the non-rootvg disks, you must enable the automatic varyon
option for volume groups and the auto mount option for file systems after the virtual machine is
restarted on the AIX operating system.
• If the application is in any of the failure states, for example, NOT_STOPPABLE, NOT_STARTABLE,
ABNORMAL, or FAILURE, you must fix the failure issue, and then use the ksysvmmgr start|resume
application command to start and monitor the application.
• If the KSYS cluster is deleted, or if a virtual machine is not included for the HA management, the VM
agent daemon becomes inoperative. You must manually re-start the VM agent daemon in the virtual
machine to bring the VM agent daemon to operative state.
• On a critical application failure, the KSYS subsystem continues to relocate the virtual machine from one
host to another host even if the virtual machine relocated to the home host. For example, if a host group
contains two hosts (host1 and host2) and a if the registered critical application in the vm1_host1 virtual
machine fails, the KSYS subsystem relocates the vm1_host1 virtual machine to host2. If the application
does not start in the NORMAL state, the KSYS subsystem again moves the vm1_host1 virtual machine to
the host1, which is the home host for the application. The KSYS subsystem continues this relocation
process until the application status becomes NORMAL.
• For the VMs running on the Linux VM agent, the reboot operation might take longer time than expected,
and the discovery operation might fail and display the following message: 'Restart has encountered
error for VM VM_Name'.
– Workaround: Re-run the discovery operation.
GUI limitations
• The VM Recovery Manager HA GUI does not support multiple sessions that are originating from the
same computer.
16 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• The VM Recovery Manager HA GUI does not support duplicate names for host group, HMC, host, VIOS,
and VMs. If a duplicate name exists in the KSYS configuration, the GUI might have issues during host
group creation or in displaying the dashboard data.
• The VM Recovery Manager HA GUI refreshes automatically after each topology change (for example, VM
migration operation and host migration operation). After the refresh operation is complete, the default
KSYS dashboard is displayed. You must expand the topology to view the log information in the Activity
window for a specific entity.
• Any operation performed by a user from the command-line interface of VM Recovery Manager HA is not
displayed in the activity window of the VM Recovery Manager HA GUI.
Miscellaneous
• The VM Recovery Manager HA solution does not support internet Small Computer Systems Interface
(iSCSI) disk type. Only N_Port ID virtualization (NPIV) and virtual Small Computer System Interface
(vSCSI) disk types are supported.
• In a user-defined cluster, if you want to add a host or VIOS to the environment, you must add it in the
shared storage pool (SSP) cluster first. Then, you can add the host or VIOS to the KSYS cluster. Also, if
you want to remove a host or VIOS from the environment, you must first remove it from the KSYS cluster
and then remove it from the SSP cluster.
• VM Recovery Manager HA supports only cluster and detailed type snapshot
• After each manage VIOS operation and unmanage VIOS operation, you must perform the discovery
operation.
Administrator using
GUI
KSYS
Administrator using
command line
(ksysmgr command)
HMC
Host1
VM agent VM agent
VIOS1 VIOS2
vmrmha001
VLAN
To install the VM Recovery Manager HA solution, you must first install the KSYS filesets. After the KSYS
software is installed, the KSYS subsystem automatically monitors the health of hosts by enabling the host
monitors in the VIOS partitions of each host that is part of the VM Recovery Manager HA management.
You can optionally install the VM agents in the virtual machines that run AIX or Linux operating systems to
monitor health of an individual virtual machine and applications that run in the virtual machines. You can
also install the GUI server for the VM Recovery Manager HA solution to use the GUI by using a browser.
Complete the following procedures to install the VM Recovery Manager HA solution.
1. Install the VIOS interim fix.
2. Install the KSYS software.
3. Optional: Install the GUI server.
4. Optional: Install VM agents in the virtual machines.
Note: You must have root authority to perform any installation tasks.
Follow the on-screen instructions. You might need to restart the system.
4. Verify whether the installation of interim fix is successful by running the following command:
lssw
5. If the cluster services were stopped, start the cluster services by running the following command:
The -V2 flag enables the verbose mode of installation. Alternatively, you can use the smit installp
command with the all_latest option to install all filesets in the directory.
3. Verify whether the installation of filesets is successful by running the following command:
4. Run the /opt/IBM/ksys/ksysmgr command to check the command line utility of the KSYS
subsystem. The KSYS subsystem might take a few minutes to run the command for the first time. You
can add the /opt/IBM/ksys directory to your PATH environment variable so that you can access the
ksysmgr command easily.
5. After successful installation of KSYS filesets, enter the following command to check whether the class
IDs are reserved:
20 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
cat /usr/sbin/rsct/cfg/ct_class_ids
IBM.VMR_HMC 510
IBM.VMR_CEC 511
IBM.VMR_LPAR 512
IBM.VMR_VIOS 513
IBM.VMR_SSP 514
IBM.VMR_SITE 515
IBM.VMR_SA 516
IBM.VMR_DP 517
IBM.VMR_DG 518
IBM.VMR_KNODE 519
IBM.VMR_KCLUSTER 520
IBM.VMR_HG 521
IBM.VMR_APP 522
IBM.VMR_CLOUD 523
6. If the IBM.VMR_APP or IBM.VMR_CLOUD class and its ID is not available in the output, contact IBM
support to obtain a fix for APAR IJ29360.
• If you are installing the GUI server filesets on a separate system that manages all the KSYS nodes,
run the following command:
2. Install the open source software packages, which are not included in the installed filesets, based on
the following scenarios:
• If the GUI server LPAR is connected to the internet, run the following command in the GUI server
LPAR:
/opt/IBM/ksys/ui/server/dist/server/bin/vmruiinst.ksh
This command downloads and installs the remaining files that are not included in the filesets
because these files are licensed under the General Public License (GPL).
• If the GUI server LPAR is configured to use an HTTP proxy to access the internet, run the following
command in the GUI server LPAR to specify the proxy information:
/opt/IBM/ksys/ui/server/dist/server/bin/vmruiinst.ksh -p
You can also specify the proxy information by using the http_proxy environment variable.
• If the GUI server LPAR is not connected to the internet, complete the following steps:
a. Copy the vmruiinst.ksh file from the GUI server LPAR to a system that is running the AIX
operating system and that has internet access.
b. Run the vmruiinst.ksh -d /directory command where /directory is the location where you
want to download the remaining files. For example, /vmruiinst.ksh -d /tmp/vmrui_rpms.
c. Download the following packages that are prerequisite packages for GUI server:
Installing VM agents
VM agents are components that are installed in virtual machines (VMs) or logical partitions (LPARs). These
optional agents offer robust monitoring of the VMs and applications that are running in VMs. You can
manage HA applications in VMs through a lightweight application monitoring framework.
To install a VM agent in an AIX VM, go to Installing a VM agent in an AIX VM. For setting up a VM agent in
Linux, see Installing a VM agent in a Linux VM.
Installing a VM agent in an AIX VM
1. Ensure all the prerequisites that are specified in the Requirements topic are complete.
2. Run the following command in the AIX virtual machine:
3. Perform one of the following steps to verify whether the installation of VM agent is successful:
a. Run the lslpp command.
b. Ensure that the ksysvmmgr command and the binary file for the VM agent daemon exist in the
following directories:
• /usr/sbin/ksysvmmgr
• /usr/sbin/ksys_vmmd
c. Run the lssrc -s ksys_vmm command to verify whether the VM agent daemon is enabled. The
status of the ksys_vmm subsystem must be Active in the output of this command.
Installing a VM agent in a Linux VM
To install the VM agent Red Hat Package Manager (RPM) packages in a Linux virtual machine, complete
the following steps:
1. Ensure that the following Reliable Scalable Cluster Technology (RSCT) packages are installed in the
Linux VM:
• rsct.core
• rsct.opt.storagerm
• rsct.core.utils
• rsct.basic
22 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• DynamicRM
You can download the packages from the following link: http://www14.software.ibm.com/webapp/
set2/sas/f/lopdiags/redhat/hmcmanaged/rhel7.html. For information about configuring the repository
to easily install those packages, see Updating RSCT packages for PowerVM NovaLink.
2. Install the VM agent RPM packages based on the following Linux distributions in the virtual machine.
In Red Hat Enterprise Linux (RHEL) (little endian) virtual machines, run the following command:
In SUSE Linux Enterprise Server (SLES) (little endian) virtual machines, run the following command:
3. Ensure RMC connection between the VMs and HMC. If the firewall is enabled on the RHEL VM, the RMC
connection might be broken. Modify the firewall on the VMs to allow the RMC connection with the
HMC. For details on modifying the firewall, see PowerLinux forum topic and Installing the PowerVM
NovaLink software on a Red Hat Enterprise Linux partition topic.
smit install
b. In the Install and Update Software screen, select Update Installed Software to Latest Level
(Update All), and press Enter.
Install Software
Update Installed Software to Latest Level (Update All)
Install Software Bundle
Update Software by Fix (APAR)
Install and Update from ALL Available Software
Upgrading VM agents
Upgrade the VM agent RPM packages based on the following Linux distributions in the virtual machine:
• In Red Hat Enterprise Linux (RHEL) (little endian) virtual machines, run the following commands:
vmagent-1.5.0-2.0.el7.ppc64le.rpm
• In SUSE Linux Enterprise Server (SLES) (little endian) virtual machines, run the following command:
vmagent-1.5.0-2.0.suse123.ppc64le.rpm
These commands upgrade the VM agent software without modifying the current configuration of the
virtual machines.
This command removes the hosts from the host group and deletes the host group.
• Complete the following steps:
a. Remove all hosts from the existing host groups by running the following command:
All the corresponding virtual machines, Virtual I/O Servers, and Hardware Management Consoles
will also be removed from the host group configuration.
b. Delete the host group by running the following command:
24 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
c. Delete the host so that the VIOS remote copy programs (RCP), trunk adapters, and switches are
removed by running the following command:
Uninstalling VM agents
To uninstall the VM agent in an AIX VM, go to Uninstalling a VM agent in an AIX VM. To uninstall the VM
agent in a Linux VM, see Uninstalling a VM agent in a Linux VM.
Uninstalling a VM agent in an AIX VM
1. Stop the VM agent module in the AIX virtual machine by running the following command:
ksysvmmgr stop
2. Uninstall the VM agent filesets from the AIX virtual machine by running the following command:
installp -u ksys*
ksysvmmgr stop
2. Uninstall the VM agent package from the Linux virtual machine by running the following command:
rpm -e vmagent
You can perform steps “1” on page 27 - “3” on page 27 by running the following command:
4. Verify that the KSYS cluster is created successfully by running one of the following commands:
For example, to add an HMC user with user name hscroot and an IP address, run the following
command:
To add an HMC user with user name hscroot and host name hmc1.testlab.ibm.com, run the
following command:
If the host is connected to more than one HMC, you must specify the Universally Unique Identifier
(UUID) of the host. Hosts are identified by its UUID as tracked in the HMC. You can also use the
ksysmgr query hmc command to identify the host name and the host UUID.
2. Repeat step “1” on page 28 for all hosts that you want to add to the KSYS subsystem.
3. Verify the hosts that you added by running the following command:
28 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
The KSYS subsystem creates a health monitoring Shared Storage Pool (SSP) cluster across the virtual I/O
servers that are part of the host group. The health cluster monitors health of all virtual I/O servers across
the cluster and retains the health data that is available to the KSYS subsystem by using a VIOS in the host
group. The SSP cluster is used only by the KSYS. You must not use this SSP cluster for any other purpose.
You can continue to use virtual Small Computer System Interface (vSCSI) or N_Port ID Virtualization
(NPIV) modes of the cluster. However, if an SSP cluster exists in your environment, the KSYS subsystem
does not deploy any new SSP clusters and instead, uses the existing SSP cluster for health management.
However, if an existing SSP cluster is used, the KSYS subsystem might not support VIOS management.
The KSYS subsystem requires two disks to create the health monitoring SSP cluster across the Virtual I/O
Servers in the host group. A disk of at least 10 GB is required to monitor the health of all hosts, called as a
repository disk, and another disk of at least 10 GB is required to track the health data, called as a HA disk,
for each host group. These disks must be accessible to all the managed Virtual I/O Servers on each of the
hosts in the host group. You must specify the disk details when you create the host group or before you
run the first discovery operation. You cannot modify the HA disk after the discovery operation is run
successfully. If you want to modify the HA disk, you must delete the host group and re-create the host
group with the HA disk details.
VM Recovery Manager HA supports automatic replacement of the repository disk. To automatically
replace the repository disk, you must provide the details about backup repository disk. A maximum of six
backup repository disks can be added for automatic replacement. When the storage framework detects
failure of a repository disk, the KSYS subsystem sends an event notification. Then the KSYS sub system
searches each disk on the backup repository list and locates a valid and active backup repository disk,
and replaces the failed repository disk with the backup repository disk without any interruption. The failed
repository disk will be placed as the last backup repository disk in the backup repository disk list. The
failed backup repository disk can be reused after the disk failure is fixed and it becomes valid and active.
The backup repository disk must meet all the VM Recovery Manager HA requirements for the automatic
replacement of repository disk. For more information, see, “VM Recovery Manager HA requirements” on
page 10.
If the backup repository disk is not specified, the automatic replacement feature is disabled. However, a
failed repository disk can be replaced manually from the KSYS subsystem. For more information, see,
Troubleshooting repository disk failure.
To create host group in the KSYS subsystem, complete the following steps in the KSYS LPAR:
1. Identify the available disks that you can designate as the repository disk and the HA disk for the SSP
cluster and run one of the following commands:
2. Create a host group and add the hosts and disks that you want in this host group by running the
following command:
For repository disk failure issues, see Troubleshooting repository disk failure topic.
3. Repeat step “1” on page 29 for all host groups that you want to create in the KSYS subsystem.
4. Verify the host groups that you created by running the following command:
You can include the VM back in the HA management at any time by using the ksysmgr manage vm
command.
If you installed the VM agent for HA monitoring at the VM and application level, you can enable the HA
monitoring by running the ksysvmmgr start command in the virtual machine. For more information
about configuring the VM agent, see the “Setting up the VM agent” on page 38 topic.
You can include the VIOS partition for the HA management at any time by using the ksysmgr manage
vios viosname command.
2. Verify the existing Virtual I/O Servers by running the following command:
You can configure a specific LPAR and VIOS such that during each discovery operation, the KSYS
subsystem fetches the size of the VIOS file system and the current file system usage in the VIOS. When
the percentage of file system usage reaches the threshold value of 80%, the KSYS subsystem notifies you
with a warning message so that you can make necessary updates to the VIOS file system.
The host monitor monitors the following file systems: /, /tmp, /usr, /var, /home. When the KSYS
subsystem requests for the file system usage details, the host monitor responds with the details about
the file system usage, which includes information about each file system and its usage. An event is
generated when the file system usage surpasses the threshold value of the file system usage, also an
event is generated when the file system usage comes under the threshold value.
VM auto-discovery
VM auto-discovery is a system-level property. You can disable or enabled this property. By default, this
property is enabled.
By default, the KSYS subsystem manages all VMs automatically. The VM auto-discovery property allows
the KSYS subsystem to manage or unmanage the newly created VMs and undiscovered VMs.
If the VM auto-discovery property is enabled, all VMs are managed automatically. If the auto-discovery
property is disabled, all the newly created VMs on the KSYS managed hosts and the undiscovered VMs
(existing VMs that are not yet discovered by the KSYS subsystem) will not be managed by the KSYS
subsystem.
• To check whether the VM auto-discovery property is enabled or disabled to discover the resources
across the site, run the following command:
30 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
An output that is similar to the following example is displayed:
For example,
You can add multiple email addresses for a specific user. However, you cannot add multiple email
addresses simultaneously. You must run the command multiple times to add multiple email addresses.
• To add a specific user to receive an SMS notification, enter the following command:
For example,
You must specify the phone number along with the email address of the phone carrier to receive a short
message service (SMS) notification. To determine the email address of your phone carrier, contact the
phone service provider.
• To add a specific user to receive a pager notification, enter the following command:
For example,
You must run the discovery and verification commands each time you modify the resources in the KSYS
subsystem. To perform both the discovery and verification operations, run the following command:
32 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
ksysmgr add snapshot filepath=full_file_prefix_path|file_prefix
type=CLUSTER|BASIC|DETAILED
You can restore the saved configuration snapshot by using the ksysmgr restore snapshot
filepath=filepath command.
Setting up HA policies
After you set up the KSYS subsystem successfully, set up recovery policies to customize the default
configuration settings to suit your high availability preferences.
Note: You must run the discovery and verification command after you set any policy.
The VM Recovery Manager HA solution provides the following options that you can customize:
HA monitoring
Turns on or turns off HA monitoring for the associated entity. The specified policy at the lowest
resource level is considered first for HA monitoring. If you do not specify this policy for a resource, the
policy of the parent resource is applied to the resource. For example, if you enable HA monitoring
for the host group, HA monitoring is enabled for all virtual machines within the host group unless you
disable HA monitoring for specific virtual machines.
You can enable HA monitoring for virtual machines only after you install the VM agent on each VM and
start the VM agent successfully. For details, see Setting up the VM agent topic. If you do not set up the
VM agent, the KSYS subsystem might return error messages for HA monitoring at VM-level.
To set the HA monitoring, run the following command:
ProactiveHA monitoring
ProactiveHA monitors every managed VM in the host group, the CPU utilization, and network packet
loss during virtual machine or host monitor communication. When a VM's CPU utilization exceeds 90%
or when network packet loss is detected on each of the VM's adapters during virtual machine or host
monitor communication, an event is generated. The threshold for CPU utilization is predefined. By
default, the ProactiveHA option is enabled.
Configuring network isolation events
The KSYS subsystem uses the network isolation feature to configure the VIOS netmon file, which is
used by IBM Reliable Scalable Cluster Technology (RSCT) to monitor the network status. The KSYS
subsystem generates the NETWORK_ISOLATION_SUCESS and the NETWORK_ISOLATION_ERROR
events depending on whether the configuration of the VIOS netmon file succeeded. You can use the
ksysmgr command to configure the IP addresses for the VIOS netmon file. After the discovery
operation completes, the KSYS subsystem checks the configured IP addresses at site remote copy
program (RCP) and generates a put message request for the host monitor to configure the VIOS
netmon file. To add or delete the IP addresses for network isolation detection, run the following
command:
Restart policy
Indicates the KSYS subsystem to restart the virtual machines automatically during a failure. This
attribute can have the following values:
• auto: If you set this attribute to auto, the KSYS subsystem automatically restarts the virtual
machines on the destination hosts. The KSYS subsystem identifies the most suitable host based on
free CPUs, memory, and other specified policies. In this case, the KSYS subsystem also notifies the
registered contacts about the host or VM failure and the restart operations. This is the default value
of the restart_policy attribute.
• advisory_mode: If you set this attribute to advisory_mode, the virtual machines are not
restarted automatically after host or VM failures. In this case, the KSYS subsystem notifies the
Failover priority
Specifies the order of processing of multiple VMs restart operations. For example, if a host fails and all
the VMs must be relocated to other hosts in the host group, the priority of the VM determines which
VM will be processed first. The supported values for this attribute are High, Medium, or Low. You can
set this attribute at VM-level only. You must specify the UUID of the VM if you have two or more VMs
with the same name. By default, all VMs in the host group have the priority of Medium.
To set the failover priority, run the following command:
Home host
Specifies the home-host of the virtual machine. By default, the KSYS subsystem sets this value
initially to the host where the virtual machine was first discovered. You can change the home-host
value of a virtual machine even when the virtual machine is running on another host. In such case, the
specified home-host is used for all future operations. This attribute is useful when you get a host
repaired after failure and you want to restart the virtual machines in its home-host.
34 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
To set the home-host value, run the following command:
Affinity policies
Specifies affinity rules for a set of VMs that defines how the VMs must be placed within a host group
during a relocation. The following affinity policies are supported:
• Collocation: Indicates that the set of VMs must always be placed on the same host after
relocation.
To set this option, run the following command:
• Anticollocation: Indicates that the set of VMs must never be placed on the same host after
relocation.
To set this option, run the following command:
• Workgroup: Indicates that the set of VMs must be prioritized first based on the assigned priority.
To set this option, run the following command:
• Host blacklist: Specifies the list of hosts that must not be used for relocating a specific virtual
machine during a failover operation. For a virtual machine, you can add hosts within the host group
to the blacklist based on performance and licensing preferences.
To set this option, run the following command:
Note: All the VMs in the Workgroup must have the same priority.
• Primary-secondary dependency
Defines dependency between applications, which have a hierarchical primary-secondary structure
across VMs.
To establish dependency between the primary application and the secondary application, run the
following command:
Note: The app_list attribute must have only two vmname:appname pairs for the primary-
secondary structure of applications across VMs.
You can verify the dependency between applications across VMs. You can also delete the dependency
between applications.
• To verify a dependency that you have created, run the following command:
• To delete a dependency that you have created, run the following command:
Limitations
• When you shutdown or reboot a virtual machine manually, the dependent applications are not
affected. The recovery of dependent applications are considered only when failure has occurred
with the parent application, the virtual machine, or the host.
Fibre channel (FC) adapter failure detection
The KSYS subsystem monitors Fibre Channel (FC) adapter status. An event is generated if adapter
failure is detected. To use this feature, you must enable the ProactiveHA feature.
The following four events display the status of a fibre channel (FC) adapter.
• SFW_ADAP_DOWN
• SFW_ADAP_UP
• SFW_PORT_DOWN
36 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• SFW_PORT_UP
• To update the HMC name, login credentials, or IP address, run the following command:
Note:
– ksysmgr modify vm ALL host_group=<host_group_name> doesn't work for the
blacklist_hosts attribute
• To update the contact information to receive the KSYS notification, run the following command:
• If an HMC, which is included in the KSYS configuration, is not managing any hosts, you can remove the
HMC from the KSYS configuration by running the following command:
• To remove a host, you must first remove the host group that contains the host, and then remove the
host by running the following commands:
Note: The name attribute can have multiple applications, which must belongs to the same VM. If you do
not provide the name of the application for the name attribute, the command will start or stop all the
applications of the VM.
38 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• Apart from any general application that you want the VM agent to monitor, the VM Recovery Manager HA
solution supports the following type of applications that can be monitored by using the in-built scripts in
the corresponding versions of operating systems:
Table 2. Version support matrix for application types and operating systems
Application type AIX operating system Linux operating system (RHEL
and SLES)
ORACLE Oracle Database 12.1, or later Not supported
DB2® IBM DB2 11.3, or later IBM DB2 10.5, or later
SAP_HANA Not supported SAP HANA 2.0, or later
POSTGRES Postgres 9.2.23 or later Postgres 9.2.23 or later
After you install the VM agent filesets successfully, complete the following steps to set up the VM monitor
in each guest virtual machine:
1. Start the VM monitor daemon in each virtual machine to start monitoring the virtual machines and
applications by running the following command:
For example,
To find more information about the attributes for the VM monitor daemon, see the ksysvmmgr
command topic.
2. Register applications in the VM monitor daemon that must be monitored for high availability by running
the following command:
For example,
After you add an application, if the application fails or stops working correctly, the VM agent attempts
to restart the application in the same virtual machine for several times as specified in the
max_restart attribute for the VM, which is set to 3 by default. If the application is still not working
correctly, the KSYS subsystem notifies you about the issue. You can manually review the problem and
restart the application.
3. Mark important applications as critical by running the following command:
When you mark an application as critical, if the application fails or stops working correctly, the VM
agent attempts to restart the application for several times as specified in the max_restart attribute
for the VM. If the application is still not working correctly, the KSYS subsystem notifies you about the
issue and attempts to restart the virtual machine on the same host. If the application is still not
working correctly, that is if the application status is displayed as RED when you run the ksysmgr
query app command, the KSYS restarts the VM on another host within the host group based on the
specified policies.
For example,
When you add a dependency, if the virtual machines are restarted on the same or another host, the
applications in the virtual machine are started based on the specified dependency.
5. If you need to modify any attributes for the VM monitor daemon, applications, or application
dependency, enter one of the following commands:
You can restore the saved snapshot by using the ksysvmmgr -s restore vmm filepath=filepath
command.
40 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Manual recovery of virtual machines
When a host, VM, or critical application fails and the restart_policy attribute is set to
advisory_mode, the KSYS notifies you about the issue. You can review the issue and manually
restart the virtual machines on another hosts.
If you have configured the VM agent in each of your virtual machines, the KSYS notifies you when a virtual
machine or a registered critical application fails or stops working correctly. In such cases also, you can
restart the virtual machines on another hosts based on the specified policies.
To restart the virtual machines manually on another host, complete the following steps:
1. Restart specific virtual machines or all virtual machines in a host by running the following command:
Or,
After the virtual machines are restarted successfully, the KSYS subsystem automatically cleans the
VMs on the source host and on the HMC by removing the LPAR profile from the HMC.
2. If the output of the ksysmgr restart command indicates cleanup errors, clean up the VM details
manually in the source host by running the following command:
3. If the restart operations fail, recover the virtual machine in the same host where it is located currently
by running the following command:
• To validate the LPM operation for all virtual machines in a specific host, run the following command:
If the output displays any errors, you must resolve those errors.
2. Migrate the virtual machines from the source host to another host by running one of the following
commands:
• To migrate specific virtual machines, run the following command:
When you run this command, the virtual machines are restarted on another host according to the
specified policies in the KSYS configuration settings. If you have not specified the destination host
where the virtual machines must be started, the KSYS subsystem identifies the most suitable host that
can be used to start each virtual machine.
If you have HMC Version 9 Release 9.3.0, or later, you can view the LPM progress as a percentage
value.
3. Run the discovery and verify operations after each LPM operation to update the LPM validation state by
running the following command:
4. After the maintenance or upgrade activities are complete in the source host, restore all virtual
machines by running the following command:
Prerequisites
The AIX operating system does not have sudo features by default. You must download the sudorpm
package from the web and install it in the KSYS node.
42 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
##
## Host alias specification
##
##
## User alias specification
##
##
## Cmnd alias specification
##
#Cmnd_Alias SU = /usr/bin/su
##
## Uncomment to enable logging of a command's output, except for
## sudoreplay and reboot. Use sudoreplay to play back logged sessions.
# Defaults log_output
# Defaults!/usr/bin/sudoreplay !log_output
# Defaults!/usr/local/bin/sudoreplay !log_output
# Defaults!REBOOT !log_output
##
## Runas alias specification
##
##
## User privilege specification
##
root ALL=(ALL) ALL
<username> ALL=(ALL) /opt/IBM/ksys/ksysmgr q vm, /opt/IBM/ksys/ksysmgr q host
## Allows people in group wheel to run all commands
# %wheel ALL=(ALL) ALL
The command runs successfully and an output similar to the following example is displayed.
ERROR: KSYS subsystem is currently offline, please sync ksyscluster to start KSYS "ksysmgr
sync ksyscluster <name>"
# sudo /opt/IBM/ksys/ksysmgr q vm
The command runs successfully and an output similar to the following example is displayed.
ERROR: KSYS subsystem is currently offline, please sync ksyscluster to start KSYS "ksysmgr
sync ksyscluster <name>"
The command does not run successfully and an output similar to the following example is displayed.
Since you provided the execute permission to the user for the first two commands in the example sudoers
file (see the previous topic), the ksysmgr q vm and ksysmgr q host, commands ran successfully, and
because you did not provide the execute permission to the user for the ksysmgr q vios command in
the example sudoers file, this command did not run successfully, and a message stating that the user is
not allowed to execute the command was displayed.
To resolve this error, export the library path LIBPATH=/opt/freeware/lib:$LIBPATH by running the
export command.
VM agents
This section describes the VM agents that VM Recovery Manager HA supports.
DB2
Scripts:
The KSYS VM daemon uses the following scripts to start, stop, and monitor the DB2 application.
• /usr/sbin/agents/db2/startdb2
• /usr/sbin/agents/db2/stopdb2
• /usr/sbin/agents/db2/monitordb2
To add the DB2 VM agent, run the following command:
Example:
In this example, db2inst1 is the database instance owner and dbone is the database to monitor.
Attributes:
• type: While creating the DB2 application, the type attribute must have the value, DB2.
• instancename: The instancename attribute must be specified with the DB2 instance owner.
The instancename attribute is also passed as a parameter to the start and stop scripts.
Oracle
Scripts:
The KSYS VM daemon uses the following scripts to start, stop and monitor the oracle application.
• /usr/sbin/agents/oracle/startoracle
• /usr/sbin/agents/oracle/stoporacle
44 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• /usr/sbin/agents/oracle/monitororacle
To add the Oracle VM agent, run the following command:
Example:
In this example, orauser is the Oracle username (SID) and DBRESP is the oracle system identifier
(SID) or database name.
Attributes:
• type: While creating the ORACLE application, the type attribute must have the value, ORACLE.
• instancename: The instancename attribute must be specified with the ORACLE username.
• database: The database name must be specified with the ORACLE system identifier (SID).
POSTGRES
Scripts:
The KSYS VM daemon uses the following scripts to start, stop and monitor the POSTGRES application.
• /usr/sbin/agents/postgres/startpostgres
• /usr/sbin/agents/postgres/stoppostgres
• /usr/sbin/agents/postgres/monitorpostgres
To add the POSTGRES VM agent, run the following command:
Example:
In this example, postgres is the instance owner and testdb is the database name.
Attributes:
• type: While creating the POSTGRES application, the type attribute must have the value, POSTGRES.
• instancename: The instancename attribute must be specified with the POSTGRES instance
owner.
• database: The database name is optional.
Note: If the database name is specified, the script monitors only the specified database. And if the
database name is not specified, the script monitors all database of the postgres instance.
• configfile: The configfile attribute specifies the file path of the configuration file that stores
settings of the application configuration. You must specify the path of the configuration file while
adding the POSTGRES application. A sample configuration file, POSTGRESconfig.XML is provided
in the /var/ksys/config/samples folder. You can use this sample file by updating the attribute
values.
If you do not specify appropriate values in the configuration file, you canto add the POSTGRES
application agent.
The following table lists the description of the attributes present in the POSTGRES configuration file.
<!--POSTGRESinstance id="instOwner"-->
<!--data_directory>/var/lib/pgsql/data</data_directory-->
<!--/POSTGRESinstance-->
</POSTGRESConfig>
SAP_HANA
IBM VM Recovery Manager HA for Power Systems supports the following SAP HANA configurations:
1. SAP HANA scale up configuration with host-based replication: you can create a replication
between two SAP HANA nodes and add them to a VM agent. IBM VM Recovery Manager HA for
Power Systems manages the SAP HANA nodes and replication between the two node, such as
takeover of primary node failures.
2. SAP HANA scale up configuration without replication: You can install the SAP HANA DB and add
it to a VM agent. The VM agent monitors the database status and manages any failures.
Scripts:
The KSYS VM daemon uses the following scripts to start, stop and monitor the SAP HANA application.
• /usr/sbin/agents/saphana/startsaphana
• /usr/sbin/agents/saphana/stopsaphana
• /usr/sbin/agents/saphana/monitorsaphana
To add the SAPHANA VM agent, run the following command:
Examples:
• To add the SAP HANA application without replication:
In these examples, S01 is the SAPHANA system ID, HDB01 is the database name.
46 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Attributes:
• type: While creating a SAP HANA application, the type attribute must have the value, SAPHANA.
• instancename: The instancename attribute must be specified with SAP HANA system ID.
• database: The database attribute must be specified with the SAP HANA database name.
• configfile: The configfile attribute specifies the file path of the configuration file, which stores
settings of the application configuration. This attribute is not required for the SAP HANA application
without replication configuration. But, you must specify the path of the configuration file while
adding the SAP HANA application with replication configuration. A sample configuration file,
POSTGRESconfig.XML is provided in the /var/ksys/config/samples folder. You can use this
sample file by updating the attribute values. If you do not specify the configuration file path or
appropriate values in the configuration file, the SAP HANA application will be added without
replication configuration.
The SAPHANAconfig.xml file contains the following attributes:
instanceId, replication, role, localsite, remotesite, secondarymode, virtualip,
interfacename, executabledir, timeout and remotenode.
If the replication attribute of the instance is set to yes, you must specify values for all mandatory
attributes (for example, replication, role, localsite, remotesite, secondary mode, and
remotenode). If you do not specify values for all mandatory attributes, the SAP HANA application
will not be added to the virtual machine.
Few attributes are optional: virtualip, interfacename, executabledir and timeout.
The following table lists the description of the attributes present in the SAP HANA configuration file.
role The value of the attribute can be primary, sync, syncmem, or async.
For a primary node, the value of this attribute is primary and for
secondary node, the value of this attribute is sync, syncmem, or async.
localsite The site name of the current node (the value that is specified while
configuring the SAP HANA replication).
remotesite The site name of the remote node (the value that is specified while
configuring the SAP HANA replication).
secondarymode The mode of secondary node. The value of the attribute can be syn,
syncmem, or async.
remotenode Hostname of the remote node. This name should be the same as the
name shown in the output of the SAP command, hdbnsutil -
sr_state.
subnet mask The subnet mask can be used with the service or the virtualip
address.
interfacename The interface on which the specified virtual or service IP address will be
aliased.
executabledir The directory path where the shared libraries and executable files of
SAP_HANA VM agent are present.
timeout The duration of time within which the SAP_HANA command is expected
to complete. This value is indicated in seconds. The default value is 120
seconds.
An example configuration file for the SAP HANA configuration without replication setup follows:
An example configuration file for the secondary node SAP HANA configuration with replication
follows:
48 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Setting up KSYS high availability through PowerHA SystemMirror
The KSYS subsystem is a major component in VM Recovery Manager HA, which monitors and manages
the complete environment health. Hence, setting up high availability for the KSYS subsystem will be
helpful to handle any scenario where the KSYS daemon hanged or the KSYS node went down. This high
availability can be set up by managing the KSYS daemon by using PowerHA SystemMirror software. To
manage the KSYS daemon through PowerHA SystemMirror software, PowerHA SystemMirror must be
configured to monitor and manage the KSYS daemon by using custom scripts.
Prerequisite
• PowerHA SystemMirror 7.2.1, or later must be installed
• VM Recovery Manager HA 1.5, or later must be installed
• The /etc/hosts and /etc/cluster/rhosts files must be modified to include all PoweHA nodes
• The variable CT_MANAGEMENT_SCOPE=2 must be defined in the .profile file for all nodes, and must
be exported by running the export command
• When two-node PowerHA SystemMirror is integrated with the KSYS subsystem, each node needs to
have minimum 30 GB space.
Concept
When the PowerHA SystemMirror cluster is created, Reliable Scalable Cluster Technology (RSCT) cluster
is also created. The RSCT cluster creates the resource manager (RM) framework and allocates resources
(including KSYS RM resources) to the KSYS subsystem. Hence, the KSYS subsystem can use the RSCT
cluster and the resource manager framework instead of creating a new RSCT cluster. This ensures that
the configuration settings and saved data or modifications to the data in one KSYS node reflects in the
other KSYS node. The KSYS daemon can be monitored by custom scripts of the PowerHA SystemMirror
resource group. The resource group remains online on one node at a time. If the KSYS node goes down,
the resource group moves to a different node. The KSYS node, to which the resource group has moved,
starts monitoring and managing the environment. This ensures high availability between KSYS nodes.
Limitations
Consider the following limitations for the KSYS subsystem's high availability through PowerHA
SystemMirror:
• To sync configuration settings between KSYS nodes, the IBM.VMR daemon must be active on both KSYS
nodes.
• If a current group leader KSYS node fails, the next available KSYS node resumes control only after the
IBM.VMR daemon is stopped and restarted.
• Only two KSYS nodes are supported in a PowerHA SystemMirror cluster for the KSYS subsystem's high
availability.
50 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Local database mode
You cannot update HA configuration (for example, disable/enable HA monitoring feature) when the VIOS
is in local database mode.
ksysmgr command
Purpose
The ksysmgr command provides a consistent interface to configure the controller system (KSYS) and to
perform VM Recovery Manager HA operations. This command can be run from a terminal or a script.
Syntax
The basic format of the ksysmgr command follows:
Alias usage
Alias is a shorthand definition for an operation and is defined by the most significant letters. The asterisk
(*) in the aliases signify wildcard characters. For example, the alias value for the modify ACTION is mod*.
If you type modd, the command still works. Aliases are provided for convenience from the command line
and must not be used in scripts.
Log file
All ksysmgr command operations are logged in the /var/ksys/log/ksysmgr.oplog file, which
includes the name of the command that was executed, start time, process ID for the ksysmgr operation,
the command with arguments, and the overall return code. In addition, the /var/ksys/log/
ksysmgr.log file tracks the internal activities of the ksysmgr command. The amount of information that
is written to the ksysmgr.log file can be modified for each command by using the -l flag.
Notes:
• You must have root authority to run the ksysmgr command.
• Help information is available for the ksysmgr command from the command line. For example, when
you run the ksysmgr command without any flags or parameters, a list of the available ACTIONs is
displayed. If you enter ksysmgr ACTION in the command line without specifying any CLASS, the
command results in a list of all the available CLASSes for the specified ACTION. Entering ksysmgr
ACTION CLASS without specifying any NAME or ATTRIBUTES parameters might yield different results
because some ACTION and CLASS combinations do not require any additional parameters. To display
help information in this scenario, you can view the help information by appending the -h flag to the
ksysmgr ACTION CLASS command.
• You cannot display help information from the command line for each ATTRIBUTE of the ksysmgr
command.
Flags
You can use the following flags with the ksysmgr command:
ACTION
Describes the action to be performed. The ACTION flags are not case-sensitive. All ACTION flags
provide a shorter alias. The following ACTION flags are available:
54 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
ATTRIBUTE=VALUE
Specifies an optional flag that has attribute pairs and value pairs that are specific to the ACTION and
CLASS combination. Use these pairs to specify configuration settings or to run particular operations.
Both the ATTRIBUTE and VALUE flags are case-sensitive.
-a {<ATTR#1>,<ATTR#2>,...}
Displays only the specified attributes. This flag must be used with the query ACTION flag. For
example: ksysmgr -a name,sitetype query site.
-f
Overrides any interactive prompts and forces the current operation to be run.
-h
Displays help information.
-l low|med|high|max
Activates the following trace logging values:
low
Logs basic information for every ksysmgr operation. This is the default value of the -l flag.
med
Logs warning messages also.
high
Logs basic information messages also that can be used for demonstrations.
max
Performs high tracing operations such as adding the routine function and the utility function. It
also adds a transaction ID to the entry messages of each function.
All trace data is written in the ksysmgr.log file. This flag is ideal for troubleshooting problems.
-v
Displays maximum verbosity in the output.
-i
Skips the interactive prompts from the ksysmgr command.
Commands 55
ksysmgr delete ksyscluster name
Note: When you delete the KSYS cluster, the virtual machine (VM) agent daemon becomes
inoperative. In such cases, you must manually start the VM agent daemon.
HMC management
• To add an HMC:
• To query an HMC:
• To delete an HMC:
Host management
• To add a host:
• To query hosts:
• To delete a host:
You must first delete the host group to which the host belongs before deleting a host.
• To sync updated host information:
56 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Host group configuration
• To add a host group:
You can use the ksysmgr query viodisk command to identify the free disks that can be used in
this step.
• To modify host group details:
• To modify the repository disk and the HA disk that are associated with a host group:
You cannot modify the HA disk after the discovery operation, because the KSYS subsystem does not
support the HA disk modification after the SSP cluster is created.
• To discover a host group:
Workgroup
• To query a workgroup:
• To delete a workgroup:
Commands 57
Application dependency between virtual machines
• To establish dependency between applications of a virtual machine within a host group, run the
following command:
Note:
– The app_list attribute must have only two vmname:appname pairs for the primary-
secondary structure of applications across VMs.
– By default, the mode is sync for the primary-secondary type of application dependency.
• To query application dependency:
• To start an application:
Note:
– You can enter multiple application names as value of the name attribute. But, all applications
must belong to the single virtual machine.
– If you do not provide any application name, all applications in the virtual machine start.
• To stop an application:
Note:
– You can enter multiple application names as value of the name attribute. But, all applications
must belong to the single virtual machine.
– If you do not provide any application name, all applications in the virtual machine stop.
HA monitoring policies
• To enable or disable HA management for the entire host group:
• To specify the time that KSYS waits on a non-responsive host before declaring the host to be in an
inactive state:
The value of this attribute can be in the range 90 - 600 seconds. The default value is 90 seconds.
58 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• To specify the time that KSYS will wait before declaring the failure of a VM:
You can select one of the following options: fast (140 seconds), normal (190 seconds), or slow
(240 seconds). The default value of the vm_failure_detection_speed attribute is normal.
• To specify whether the virtual machines must be restarted automatically after a failure:
The default value of the restart_policy attribute is auto. If you set the attribute to
advisory_mode, the virtual machines are not restarted automatically during failures. In this case,
the KSYS subsystem notifies the registered contacts about the failures and the administrator must
review the failure and manually restart the VMs on other hosts by using the ksysmgr commands.
• To modify the allocation of memory and CPU resources to a virtual machine when a virtual machine
is moved from a host to another host:
Capacity adjustment occurs only when the VM is moving to a host that is not its home-host. Also,
capacity adjustment is not possible when VM is moved to another host by using the Live Partition
Mobility (LPM) operation. Capacity value is accepted in percentage. For example, a value of 89 or
890 means 89.0% of the original capacity must be deployed on the backup host of the host group
when a VM is relocated.
Affinity policies
• To specify that the set of VMs must always be placed on the same host after relocation:
• To specify that the set of VMs must not be placed on the same host after relocation:
• To specify a list of hosts that must not be used for relocating a specific virtual machine during a
failover operation:
Note: When you stop managing a virtual machine, the VM agent daemon becomes inoperative. In
such cases, you must manually start the VM agent daemon.
Commands 59
• To set the priority of a virtual machine or to specify the order of virtual machines for a specific
operation:
where, the filepath parameter is an XML file that contains a list of virtual machines. An example
of the XML file follows:
<?xml version="1.0"?>
<KSYSMGR><VM><NAME>VM1</NAME></VM>
<VM><NAME>VM2</NAME></VM>
<VM><NAME>VM3</NAME></VM>
</KSYSMGR>
For example, when you relocate all the VMs to another host, the priority of the VMs determines
which VMs must be processed first.
• To set the home-host of a virtual machine:
When you specify the display_all attribute, the output of this command includes the virtual
machines of hosts that are not added to the KSYS subsystem but are registered in the added HMCs.
The State field in the output is a significant field that indicates the current status of the VM. If the
HA_monitor option is enabled, the Heartbeating_with field indicates the VIOS name to which
the VM is sending heartbeats. If the VM is not associated with any VIOS, this field will be not be
available.
• To manually clean up a specific virtual machine after the move operations:
• To query collocation:
• To query anticollocation:
VIOS management
• To query a specific VIOS or all virtual I/O servers:
60 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
When you specify the display_all attribute, the output of this command includes the VIOS of
hosts that are not added to the KSYS subsystem but are registered in the added HMCs.
• To include or exclude a specific VIOS from the HA management:
Notification contacts
• To register contact details for notification from the KSYS:
If you chose the advisory_mode option in the restart policy, you must register an email address to
get notifications about the failures.
• To modify the contact details for notification from the KSYS:
You can add a maximum number of 10 event notification scripts to the KSYS configuration settings.
• To modify a specific notification script:
Script management
• To register scripts for specific KSYS operations such as discovery and verification:
Commands 61
ksysmgr add script entity=host_group
pre_offline|post_online|pre_verify|post_verify=script_file_path
System-wide attributes
• To query system status:
Configuration snapshots
• To create a backup of the KSYS environment:
• To query snapshots:
You cannot remove a snapshot by using the ksysmgr command. You must remove the snapshot
manually by using standard operating system commands.
Cleanup operations
• To clean up a specific virtual machine in a specific host:
62 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
ksysmgr [-f] lpm host hostname|uuid [to=hostname|uuid]
The manual restart commands must be used to relocate the VMs if you have set the
advisory_mode option for the restart policy. You cannot restart a VM on the same host in which
the VM is located by using the ksysmgr restart command. To restart a VM on the same host, use
the restart option in the HMC. If the restart operations fail, you can attempt to recover the virtual
machine in the same host where it is located.
• To recover a VM in the same host where the VM is located:
General queries
• To check the version of the KSYS software:
• To list the various types of events that are generated by the KSYS subsystem:
• To list the free disks that are attached to or shared among various VIOS:
You can use this command to check the status of the registered applications inside each virtual
machine. The application status value GREEN indicates a stable application. The YELLOW value
Commands 63
indicates an intermediate state of application in which the application is attempted to be restarted.
The RED value indicates permanent failure of an application.
The HA policies for an application can be configured in the VM only by using the ksysvmmgr
command.
• To review the relocation plan for a VM or a host:
Examples
The following examples show a scenario where you deploy a host group for HA management, enable HA
policies for the host group, and perform planned and unplanned relocation operations for the virtual
machines, if required:
1. Identify the servers or hosts that will be part of the host group. Complete all the prerequisites that
are specified in the Requirements topic.
2. Create, verify, and synchronize a KSYS cluster by running the following command:
3. Register all the HMCs in the environment by running the following command:
5. Identify the free disks that you can designate as repository disk and HA disk for the SSP cluster by
running the following command:
6. Modify the host group to add repository disk and HA disk by running the following command:
7. Configure the failure detection time and other policies according to your requirements by running the
following commands:
8. Configure an email address for event notifications by running the following command:
9. Enable HA monitoring for the host group by running the following command:
64 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
ksysmgr modify host_group HG1 options ha_monitor=enable
10. Discover and verify the added resources and policies by running the following command:
11. If you want to upgrade a host or if you plan a host maintenance, migrate all the virtual machines to
another host by running the following command:
12. After the upgrade operation or maintenance activity of the host is complete, restore the virtual
machines back to the host by running the following command:
13. If a host or a virtual machine fails, restart the virtual machines on another host by running the
following command:
ksysvmmgr command
Purpose
The ksysvmmgr command provides a consistent interface for the VM agent to manage the virtual
machine monitor (VMM) and the applications that are running in the virtual machine.
Syntax
The ksysvmmgr command uses the following basic format:
Description
The VM agent consists of the following subsystems:
VM monitor
This subsystem tracks the health of the VM by communicating periodically with the host monitor in
the VIOS.
Application monitoring framework
This subsystem is an application monitoring framework that registers, starts, stops, and monitors the
applications.
You can access the VM monitor and the application monitoring framework only through the command line
by using the ksysvmmgr command. Graphical user interface is not available for such operations.
Log file
All ksysvmmgr command operations are logged in the /var/ksys/log/ksysvmmgr.log file, including
the name of the command that was executed, the start and stop time of the command, and the user name
of the user who initiated the command. You can use the -l flag to change the amount of information that
is written to the log files. The ksysvmmgr command sends user messages that are transformed by using
the catalog messages.
Commands 65
Flags
You can use the following flags with the ksysvmmgr command:
ACTION
Describes the action to be performed.
The ACTION flags are not case-sensitive. All ACTION flags provide synonyms, and each alias and
synonym have shorter alias. For example, remove is an alias for delete, and remove can be
abbreviated with rm. Aliases are provided for convenience from the command line and must not be
used in scripts. The following ACTION flags are available:
• help (alias: h*)
• add (alias: a*, register, reg*)
• query (aliases: q*, list, get, li*, g*)
• modify (aliases: mod*, change, set, ch* se*)
• delete (aliases: unregister, remove, de*, unr* re*, rm)
• sync (alias: syn*)
• start (alias: on, enable, star*, en*)
• stop (alias: off, disable, sto*, di*)
• backup (alias: bac*)
• restore (alias: rest*)
• suspend (alias: unmonitor, sus*, unm*)
• resume (alias: monitor, resu*, mon*)
• snap (alias: snap, sna*)
CLASS
Specifies the type of object on which the ACTION is performed. The following CLASS objects are
supported:
• vmm (alias: v*): The vmm CLASS is used by default. If you do not specify any CLASS, the action is
performed on the VM monitor by default.
• app (alias: a*): The app CLASS is used to perform ACTION on applications. You must specify a
NAME attribute to apply the ACTION on a specific application.
• dependency (alias:dep*): The dependency CLASS establishes a dependency relationship between
the applications. You must specify the list of applications and dependency type to apply a
dependency between applications.
• process (alias:none): The process class is used to perform ACTION on processes. You must specify
a NAME attribute to apply the ACTION on a specific process.
NAME
Specifies the particular object, of type CLASS, on which the ACTION must be performed. The NAME
flags are case-sensitive. You can use this flag only for the app CLASS.
ATTRIBUTES
Specifies an optional flag that has attribute pairs and value pairs that are specific to the ACTION and
CLASS combination. Use these pairs to specify configuration settings or to run particular operations.
Both ATTRIBUTES and VALUE flags are case-sensitive. You cannot use the asterisk (*) character in the
ACTION and CLASS names.
-a
Displays only the specified attributes. This flag is valid only with the query ACTION. Attribute names
are not case-sensitive.
-f
Overrides any interactive prompts, forcing the current operation to be attempted, if allowed.
66 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
-h/-?
Displays help information.
-l 0|1|2|3
Activates the following trace log values for serviceability:
• 0: Updates the log file when an error is detected. This level of trace logging is the default value.
• 1: Logs warning messages also.
• 2: Logs basic information messages also that can be used for demonstrations.
• 3: Performs high tracing by logging the details about the routine function and the utility function.
Traces the entry and exits for various functions.
All trace data is written into the vmmgr.log file. This flag is used for troubleshooting problems.
-s
Synchronizes the VM monitor daemon immediately by sending a notification to the daemon. The VM
monitor daemon reloads the XML configuration when it receives this notification.
This flag is valid only with the add, modify, delete, resume, and suspend ACTIONS. By default, no
notification is sent. To send a notification, use the ksysvmmgr sync command.
Attributes
The ksysvmmgr command configures the following classes and attributes:
vmm
When you start the VM monitor, the VM monitor daemon sends heartbeats to the host monitor, when
requested by the host monitor, so that the KSYS subsystem can monitor the virtual machines. The VM
monitor can have the following attributes:
version
Specifies the version of XML. This mandatory attribute is set to 1.0 for the current version of VM
monitor and cannot be modified.
log
Specifies the log level of the VM monitor daemon. This attribute can have the following values:
• 0: Only errors are logged. It is the default value.
• 1: Warnings are also logged.
• 2: Informational messages are also logged.
• 3: Details of the operation are logged. This information is used for debugging.
period
Specifies the time duration in seconds between two consecutive occurrences of checks that are
performed by the Application Management Engine (AME). By default, the value of this attribute is
1 second. The value of this attribute must be in the range 0 - 6. For best monitoring performance,
do not modify the default value.
Application (app)
The application class contains the following mandatory attributes:
monitor_script
A mandatory script that is used by the VM agent to verify application health. This script is run
regularly (based on the monitor_period attribute value) and the result is checked for the
following values:
• 0: Application is working correctly.
• Any value other than 0: Application is not working correctly or has failed.
After several successive failures (based on the monitor_failure_threshold attribute value),
the application is declared as failed. Based on the specified policies, the KSYS subsystem
determines whether to restart the virtual machine.
Commands 67
stop_script
A mandatory script that is used by the VM agent to stop the application if the application must be
restarted. The application can be restarted by successively calling the stop_script and
start_script scripts.
start_script
A mandatory script that is used by the VM agent to start the application if the application must be
restarted.
Exception: These scripts are not mandatory for the following application types: ORACLE, DB2, and
SAPHANA.
The application class contains the following optional attributes:
monitored
Specifies whether the application is monitored by the KSYS subsystem. This attribute can have the
following values:
• 1 (default): The application monitoring is active.
• 0: The application monitoring is suspended.
monitor_period
Specifies the time in seconds after which the application monitoring must occur. The default value
of 30 seconds specifies that the monitor_script script is run by the VM agent every 30
seconds.
monitor_timeout
Specifies the waiting time in seconds to receive a response from the monitor_script script. The
default value is 10 seconds, which means that the VM monitor waits for 10 seconds to receive a
response from the monitor_script script after which the script is considered as failed.
monitor_failure_threshold
Specifies the number of successive failures of the monitor_script script that is necessary
before the VM monitor restarts the application. A restart operation is performed by successively
calling the stop_script and start_script scripts.
stop_stabilization_time
Specifies the waiting time in seconds to receive a response from the stop_script script. The
default value is 25 seconds, which means that the VM monitor waits for 25 seconds to receive a
response from the stop_script script after which the script is considered as failed.
stop_max_failures
Specifies the number of successive failures of the stop_script script that is necessary before
the VM monitor considers that it cannot stop the application. The default value is set to 3.
start_stabilization_time
Specifies the waiting time in seconds to receive a response from the start_script script. The
default value is 25 seconds, which means the VM monitor waits for 25 seconds to receive a
response from the start_script script after which the script is considered as failed.
start_max_failures
Specifies the number of successive failures of the start_script script that is necessary before
the VM monitor considers that it cannot start the application. The default value is set to 3.
max_restart
Specifies the number of cycles of successive VM restart operations that result in a monitoring
failure before the daemon pronounces that restarting at VM level is insufficient. By default, this
attribute is set to 3.
status
Specifies the dynamic status of application that is returned by Application Management Engine
(AME). This attribute cannot be modified.
The color codes signifies the dynamic status of an application.
68 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Table 5. Color code for dynamic application status
Color code Application status
Red Permanent application failure
Orange Initial application state
Yellow Application is in normal state but failed more
than two times within last 24 hours
Green Application is in normal state
Gray Application is not monitored by the VMM
daemon
Blue Application is in other intermittent states. For
example, starting, stopping, or failing state.
version
Specifies the application version. This attribute does not have a default value.
critical
Marks the application as critical. The valid values are Yes and No (default). If you mark an
application as critical, failure of the application may lead the VM to be rebooted or relocated by
the KSYS subsystem.
type
Specifies the type of application. By default, the type attribute value is CUSTOM that indicates
general applications. Other supported values are ORACLE, DB2, POSTGRES and SAPHANA. This
attribute is case-sensitive and you must use uppercase characters. For these types of
applications, if you do not specify start, stop, and monitor scripts, the internal scripts of the VM
monitor are used.
instancename
Specifies the instance name for applications. This attribute is applicable only for agent
applications, which are internally supported by the VM agent. The supported agent applications
are: oracle, DB2, SAPHANA and POSTGRES. For example,
• If the application type is ORACLE, the instancename attribute must be specified with the
Oracle user name.
• If the application type is DB2, the instancename attribute must be specified with the DB2
instance owner.
• If the application type is SAPHANA, the instancename attribute must be specified with the
SAPHANA system id.
• If the application type is POSTGRES, the instancename attribute must be specified with the
POSTGRES instance id.
database
Specifies the database that the applications must use. This attribute is applicable only for agent
applications, which are internally supported by the VM agent. The supported agent applications
are: oracle, DB2, SAPHANA and POSTGRES.For example,
• If the application type is ORACLE, the database attribute must be specified with the Oracle
system identifier (SID).
• If the application type is DB2, the database attribute is not required.
• If the application type is SAPHANA, the database attribute must be specified with the SAP
HANA database.
• If the application type is POSTGRES, the database attribute can be specified with the database
name. If the database name is not specified, the script monitors all database of the POSTGRES
instance.
Commands 69
appstarttype
Specifies the method in which the applications must be started. This attribute can have the
following values:
• VMM: Specifies that the VM agent must start and monitor the applications.
• OS: Specifies that the application must be started by the operating system or by a user.
• KSYS: Specifies that the application must be started or stopped by the KSYS subsystem. After
the application is started by the KSYS subsystem, and eventually the application crashes, the
VMM must restart the application.
From the ksysvmmgr command-line-interface (CLI), you can modify the appstarttype attribute
of an application.
Further, if a VM daemon reboots, the VMM daemon starts all the VMM controlled applications, but
the VMM daemon cannot start the KSYS controlled applications. Instead, the VMM daemon sends
the status of the application to the KSYS subsystem. The KSYS subsystem determines whether to
start or stop the KSYS controlled applications. The KSYS subsystem has privilege to modify the
value of the appstarttype attribute of an application from KSYS to VMM, or vice versa.
The default value of the appstarttype attribute of an application is VMM.
Note: If the appstarttype attribute is modified from KSYS to VMM, then you must manually
delete all the related application dependencies in the KSYS subsystem.
groupname
Specifies the groupname to which the application belongs. The default value of the groupname
attribute is NULL. After a user creates a new group, the groupname of each application in the
group is updated.
configfile
This file contains the application configurations settings for the supported agent applications .
This attribute is used only by SAP HANA and POSTGRES agent applications. This attribute is blank
for other agent applications.
Application dependency (dependency)
The dependency class contains the following mandatory attributes:
dependency_type
Specifies the type of dependency between applications. This attribute can have the following
values:
• start_sequence: Specifies the order in which the applications must be started as mentioned
in the dependency_list attribute. The dependency_list attribute must have more than one
application for this dependency type.
• stop_sequence: Specifies the order in which the applications must be stopped as mentioned
in the dependency_list attribute. The dependency_list attribute must have more than one
application for this dependency type.
• parent_child: Specifies the parent-child relationship of the two specified applications in
which one application is parent and the other is child. The parent application must start first and
then the child application starts. You must stop the child application first and then stop the
parent application. If the parent application fails, the child application also stops automatically.
If the parent application recovers and starts, the child application started automatically.
dependency_list
Specifies the list of applications that have a dependency between them.
The dependency class also contains the following optional attributes:
strict
Specifies whether to continue the script or command if the dependency policy cannot be followed.
If the strict attribute is set to Yes, the next application is not started until the previous
application starts and is in the normal state. If the strict attribute is set to No, the next
70 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
application is started immediately after the first application is started irrespective of the state of
the first application. This attribute is applicable only for the start_sequence dependency.
Process
The process class can have the following attributes:
start_script
This script is used by the process monitor to restart a process in a virtual machine. The restart
operation is performed by successively calling the stop_script and the start_script.
stop_script
This script is used by the process monitor to stop an application.
monitor_period
Duration between two successive process monitor operations. This duration is displayed in
seconds. The default values is 30 seconds.
stop_stabilization_time
The period of time before which no answer from the stop_script script is considered as
timeout. The default values is 25 seconds.
stop_max_failures
The number of successive failures of the stop_script script after which it is considered that the
processcannot be stopped. The default values is 3 successive failures.
start_stabilization_time
The duration of time before which no answer from the start_script script is considered as
timeout. This duration is displayed in seconds. The default value is 25 seconds.
start_max_failures
The number of successive failures of the start_script script after which it is considered that
the process cannot be started. The default values is 3 successive failures.
max_restart
The maximum number of restart cycles, after which the process monitor is considered as failed.
The default value is 3.
ksysvmmgr -h vmm
• To query the details about the VM monitor daemon, run the following command:
Commands 71
ksysvmmgr backup vmm [<ATTR#1>=<VALUE#1>]
• To notify the VM monitor daemon to synchronize with the contents of the XML configuration file, run
the following command:
This command creates a *.pax.gz file in the /var/ksys/log/snap directory. To read the
contents of the file, you can unzip the file by using the following commands:
unzip *.pax.gz
pax -r -f *.pax
Application operations
• To display help information about the app class, run the following command:
ksysvmmgr -h app
• To add a critical application that must be monitored, run the following command:
• To query the details about a registered application, run the following command:
• To delete specific or all applications from the VM agent configuration settings, run the following
command:
• To suspend the monitoring of an application or all applications, run the following command:
• To resume the monitoring of an application or all applications, run the following command:
72 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
• To stop an application, run the following command:
Dependency operations
• To display help information about the dependency class, run the following command:
ksysvmmgr -h dep
• To query the details about an existing dependency between applications, run the following
command:
• To modify a specific dependency relationship between applications, run the following command:
Process operations
• To add a process for monitoring, run the following command:
Commands 73
ksysvmmgr [-s] delete process process_name=<NAME>
An example scenario
The following examples show a scenario in which you start the VM monitor daemon, configure the VM
agent by adding applications and dependencies:
1. To display the help information about the vmm class, run one of the following command:
1. ksysvmmgr -h vmm
2. ksysvmmgr help vmm
ksysvmmgr start
ksysvmmgr stop
ksysvmmgr snap
8. To notify the VM monitor daemon to synchronize with the contents of the XML configuration file, run
the following command:
74 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
13. To resume an application (say app1), run the following command:
18. To get the status of all applications, run the following command:
19. To get the status of an application (say app1), run the following command:
21. To modify the application list in the group, run the following command:
23. To display help information about the dependency class, run one of the following command:
ksysvmmgr -h dep
ksysvmmgr help dep
24. Notify the VM monitor daemon to synchronize with the contents of the XML configuration file, run the
following command:
27. Create a start_sequence dependency with 3 applications app1, app2, app3 in the dependency list
by running the following command:
Commands 75
ksysvmmgr -s add dependency dependency_list=app1,app2,app3
dependency_type=start_sequence
28. To create a stop_sequence dependency with 3 applications app4, app5, app6 in the dependency
list, run the following command:
29. Create a parent_child dependency with applications app1 and app2 in the dependency list by
running the following command:
31. To modify the details of the dependency that has UUID 1531835289870752764, run the following
command:
32. Display the details of the dependency that has UUID 1531835289870752764 by running the
following command:
33. To delete the dependency that has UUID 1531835289870752764, run the following command:
34. To display help information about the process class, run one of the following commands:
ksysvmmgr -h process
ksysvmmgr help process
76 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Troubleshooting VM Recovery Manager HA
To isolate and resolve problems in the VM Recovery Manager HA solution, you can use the following
troubleshooting information.
Consider the following approach when you encounter an error or an error notification:
1. When you receive errors in the configuration or recovery operations, run the following command:
2. Review the suggested action and check whether you can resolve the issue.
3. If you cannot identify the cause of the error from the command output, review the log files and the
trace files to diagnose the issue.
4. If you receive error notifications as an email or text message, review the /var/ksys/events.log file
and check for the resolution.
78 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
For example,
export VMRM_HGNAME=HA_HG1
snap -z "product_name=ksys_prod"
If the export command does not contain the host group name, the VM log files of all host groups will be
generated.
The events are categorized as critical errors, warnings, and informational events. To query all events of a
specific event type, use the following command:
The following table lists the common error events that are monitored by the KSYS subsystem:
Table 6. Common error events monitored by the KSYS subsystem
SSP_ATTRIBUTES_INIT_FAIL
SSP_REGISTRY_FAIL
SSP_RCP_DATA_FAIL
REPOSITORY_DISK_FAILURE
SWITCH_CREATE_FAILED
SWITCH_DELETE_FAILED
TRUNK_ADAPTER_CREATE_FAILED
TRUNK_ADAPTER_DELETE_FAILED
VM_RESTART_FAILED
APP_FAILURE_INITIATE_VM_RELOCATION
VM_VERIFY_FAILED
NETWORK_INTERFACE_ADDED
NETWORK_INTERFACE_ACTIVE
NETWORK_INTERFACE_DELETED
NETWORK_INTERFACE_FAILURE
80 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
The discovery operation for a host group, host, or VM failed
Problem
The discovery operation failed or a VM is not discovered by the KSYS subsystem during the discovery
operation.
Solution
1. Ensure that you have completed all the prerequisites that are specified in the Requirements
section and the configuration steps that are specified in the Configuring section.
2. Check whether the ha_monitor attribute is enabled or disabled for site, host group, host, and VM
by using the following commands:
3. If the HAMonitor field shows Disabled or not set, enable the ha_monitor attribute by using
one of the following commands:
If the ha_monitor attribute for a VM is not enabled at a VM-level, host-level, host group-level, or
system-level, the VM is not considered for the discovery operation.
4. Ensure that you have started the VM monitor daemon by running the ksysvmmgr start vmm
command. The VM agent can send heartbeats to the host monitor only when you start the VM
monitor daemon.
5. Ensure that you have set all the HMC options that are specified in the Requirements section.
6. Run the discovery and verification operations after each LPM operation to update the LPM
validation state.
82 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Solution
The LPAR profile on the source host is deleted after the virtual machine is restarted successfully on
the target host. However, if the virtual machine does not move to proper state, perform the following
steps:
1. Restart the virtual machines in the target host by running the following command:
2. If the restart operations fail, recover the virtual machine in the same host where it is located
currently by running the following command:
3. If the output of the restart command indicates cleanup errors, run the cleanup command manually
to clean up the VM details in the source host by running the following command:
ksysmgr query vm
ksysmgr query vios
If any of the host monitor or VM monitor major version does not match, you must upgrade the minor
version to any of the major versions.
84 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
Solution 1
When a repository disk fails, you can manually replace repository disk by running the modify hg
command with a new repository disk ID. Run the modify_Hg command from the KSYS subsystem.
Solution 2
When a repository disk fails, you can manually replace repository disk by completing the following
steps in the HMC GUI:
1. Log in to HMC GUI in a web browser as hscroot user.
2. Go to Resources > All Shared Storage Pool Clusters.
3. Select your cluster and click on the cluster name.
4. Click Replace Disk.
5. In the Replace Repository Disk panel, select one of the available free shared physical volumes as
the new repository disk to replace the existing repository disk.
6. Click UUID value to validate the complete UUID and the local hdisk name on each VIOS.
7. Click OK to replace the repository disk. After the operation is complete, in the Shared Storage
Pool Cluster window, click the repository disk UUID value to check whether it matches the
selected new repository disk.
8. Run the discovery operation to update the KSYS configuration settings by running the following
command:
9. After the discovery operation is complete, run the following command to verify whether the
updated repository disk UUID in SSP remote copy matches the UUID in HMC:
For example,
Investigate these errors at the HMC level and resolve them as specified in the error message.
4. Access the HMC and run the chhwres command to create or delete switches and adapters. After
you verify the configuration on the HMC, run the discovery operation to update the KSYS
configuration settings.
86 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
b. Review the HAmonitor attribute settings by using the lsrsrc command in the KSYS node as
follows:
i) Check the persistent values that are specific to the HA monitoring and are saved in the
VMR_SITE class by running the following command:
The high-availability monitoring is enabled at a global level based on the HAmonitor value.
ii) Check the persistent values that are specific to the HA monitoring and are saved in the
VMR_HG (host group) class by running the following command:
If the failure detection time for host (hostFDT value) is 0, the value specified at the site
level is used.
iii) Check the persistent values that are specific to the HA monitoring and are saved in the
VMR_CEC (host) class by running the following command:
iv) Check the persistent values that are specific to the HA monitoring and are saved in the
VMR_VIOS class by running the following command:
If a VIOS is running in a LOCAL mode, the MonitorMode field is set to LOCAL. If the host
monitor is not operating correctly, the MonitorMode field is set to DOWN.
v) Check the persistent values that are specific to the HA monitoring and are saved in the
VMR_LPAR class by running the following command:
3. Identify the actions taken by the FDE module, if any, by performing the following steps:
a. Search for the string Task added to check whether the FDE module has passed the tasks to
other components. For example:
If the FDE module passed the task, the task is added to the KSYS queue. The trace.ksys.*
trace files might contain further details.
b. Check whether a move operation is initiated by searching the RECOVERY TASK ADDED for
LPAR string. If you cannot find this string, the VM has not met the criteria for a move operation.
For example, threshold on missed heartbeats has not reached:
88 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
[15] 06/11/18 _VMR 12:34:08.266355 DEBUG VMR_LPAR.C[14541]:
ssetHBmissed 46 for romano001: 2C55D2BB-1C50-49F1-B1A3-5C952E7070C7
c. Check whether the FDE module enabled the local mode. For example:
In the global mode, the request is sent to the VIOS and the FDE module waits for a response.
The response is parsed and the FDE module either takes action or moves the task to the KSYS
subsystem. The local mode provides information about when heartbeat was missed.
6. Ensure that the IBM.VMR daemon is in the active state. If not, reinstall the daemon.
The failed application does not move to a stable state after a restart operation
Problem
The VM agent subsystem cannot restart the failed application successfully.
Solution
1. Run the ksysvmmgr query app <NAME> command to check the state and UUID of an
application. The application is in one of the following states:
UNSET
State of an application when the application monitoring starts, but its status is not set.
90 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
SAPHANA
a. Ensure that you provide the correct instance name and the database number to SAP HANA
agent scripts. For example: S01 (instance name) and HDB01 (database number).
b. Ensure that you specify the application version, instance name, and database number while
adding the application. Otherwise, the application version field remains empty.
c. Analyze the log files in the /var/ksys/log/agents/saphana/ directory to diagnose the
agent script failures. These log files contain information about any missing attribute or
parameter.
d. Ensure that you have marked the application as critical by using the ksysvmmgr modify
app <NAME> critical=yes command. The KSYS subsystem restarts a failed application
only when you mark the application as critical. When a critical application in a VM reports a
permanent failure state, diagnose the issue in the VM by checking the ksys_vmm.log file.
When a non-critical application fails, the KSYS subsystem flags this application as failed and
notifies you to take further action.
Verify whether the POSTGRES database is running on all the VIOS nodes. If the database not running,
run the following command to restart the POSTGRES database.
3. Verify that you can run the smuiauth command successfully by running the command along with
the -h flag.
4. Verify that the pluggable authentication module (PAM) framework is configured correctly by
locating the following lines in the /etc/pam.conf file:
You cannot register a KSYS node in the VM Recovery Manager HA GUI server
Problem
You cannot register a KSYS node in the VM Recovery Manager HA GUI server.
If the path is not correct, you must enter the correct path in the /etc/ssh/sshd_config file,
and then restart the sshd subsystem.
2. Check for issues in the /opt/IBM/ksys/ui/agent/logs/agent_deploy.log file on the target
cluster.
#lssrc -s vmruiserver
Subsystem Group PID Status
vmruiserver vmrui inoperative
If the status of the vmruiserver subsystem is displayed as inoperative, run the startsrc -s
vmruiserver command to restart the UI server node from the command line. You can then access
the GUI and register the agent nodes again.
92 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
You cannot stop or start the GUI server and GUI agent processes
Problem
You cannot stop or start the GUI server and agent processes.
Solution
• GUI server: Stop the GUI server by running the following command: stopsrc -s vmruiserver.
Restart the GUI server by running the following command: startsrc -s vmruiserver. If you are
starting the GUI server for the first time after installing GUI server, run the vmruiinst.ksh
command. For information about running this command, see “Installing GUI server filesets” on page
21.
• GUI agent: Stop the GUI agent process by running the following command in the guest VM:
stopsrc -s vmruiagent. This command unregisters the KSYS node from the GUI server and the
KSYS node will no longer be accessible from the GUI server.
Restart the GUI agent by running the following command: startsrc -s vmruiagent. This
command registers the KSYS node again.
# snap vmsnap
This command stores all the important log files and trace files in a compressed file at the following
location: /tmp/ibmsupt/ksys.pax.Z.
2. Collect the data from the guest virtual machines by running the following command:
# ksysvmmgr snap
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment of a fee.
Portions of this code are derived from IBM Corp. Sample Programs.
© Copyright IBM Corp. _enter the year or years_.
96 Notices
For more information about the use of various technologies, including cookies, for these purposes, see
IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at http://
www.ibm.com/privacy/details the section entitled “Cookies, Web Beacons and Other Technologies” and
the “IBM Software Products and Software-as-a-Service Privacy Statement” at http://www.ibm.com/
software/info/product-privacy.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at
Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Red Hat®, JBoss®, OpenShift®, Fedora®, Hibernate®, Ansible®, CloudForms®, RHCA®, RHCE®, RHCSA®,
Ceph®, and Gluster® are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the
United States and other countries.
Notices 97
98 IBM VM Recovery Manager HA for Power Systems Version 1.5: Deployment Guide
IBM®