AMF User Guide

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

AMF User Guide

USER GUIDE

3/1553-APR 901 0444/4 Uen B


Copyright

© Ericsson AB 2016. All rights reserved. No part of this document may be


reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to
continued progress in methodology, design and manufacturing. Ericsson shall
have no liability for any error or damage of any kind resulting from the use of
this document.

Trademark List

All trademarks mentioned herein are the property of their respective owners.
These are shown in the document Trademark Information.

3/1553-APR 901 0444/4 Uen B | 2016-11-28


Contents

Contents

1 Introduction 1
1.1 Prerequisites 1

2 Basic Concepts 3
2.1 Availability Management Framework 3
2.2 Application 3
2.3 Cluster and Node 4
2.4 Component and Service Unit 4
2.5 Health Monitoring 4
2.6 Workload 7
2.7 Assignment 7
2.8 Failover and Switchover 8
2.9 Error Detection, Recovery, Repair, and Escalation 8
2.10 Information Model 9
2.11 Redundancy Model 9
2.12 Administrative Operations 10

3 System Description and Model 11


3.1 Entity Types 11
3.2 Software Entities 12
3.3 Other Entities 15
3.4 State Models 17
3.5 Dependencies 19
3.6 Ranking 19
3.7 Component Life Cycle Controlling Commands 20

4 Application Programming Interface 23

5 Integration of Legacy Software 27


5.1 Integration Approaches 27
5.2 Types of Applications 27
5.3 Recommendations 28

6 General Concerns 29
6.1 Daemonizing 29

3/1553-APR 901 0444/4 Uen B | 2016-11-28


AMF User Guide

6.2 Logging 29
6.3 Error Handling 29
6.4 Standards Compliance 30
6.5 File System Layout 30
6.6 User Management 30

7 Building and Packaging 31


7.1 Building 31
7.2 Packaging 31

8 Installing, Upgrading, and Removing an Application 33


8.1 Entity Types File 33

Reference List 35

3/1553-APR 901 0444/4 Uen B | 2016-11-28


Introduction

1 Introduction

The Service Availability Forum (SA-Forum) is an industry consortium that has


defined a set of open Application Programming Interface (API) specifications to
enable building highly available carrier grade systems providing service continuity.
Ericsson has been active in creating the specifications and is a key contributor in
OpenSAF, an open source implementation of these specifications.

The Availability Management Framework (AMF) specification is described as ‘‘the


software entity that enables service availability by coordinate other software
entities in a cluster’’. What this fuzzy quotation means is explained in this
document.

Scope
This document is a simplified version of the AMF specification but also contains
information related to the AMF system environment and other concerns.

Target Groups
This document is intended for application designers and developers.

1.1 Prerequisites
It is assumed that the reader is familiar with the SA-Forum system architecture
and concepts. For more information, refer to www.saforum.org.

The reader is advised to have a copy of the AMF specification (Reference [1]) at
hand when reading this document, as many references are made to it. Especially
some pictures complement this document.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 1


AMF User Guide

2 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Basic Concepts

2 Basic Concepts

2.1 Availability Management Framework


The AMF is about managing applications and to keep the service they provide
available always. This includes the following major responsibilities:

— Control the life cycle: start and stop the application.

— Monitor the health of started applications.

— Manage the workload.

— Recover and repair of failed services.

— Support administrative operations on modeled entities.

— Send alarms.

— Manage the system model for applications.

2.2 Application
By application in the AMF context is usually meant the server part in a client-server
application. There are many types of servers such as web servers, database
servers, and gaming servers.

Green field applications are applications written from scratch possibly with the
AMF integration in mind. If so, they can freely use the AMF concepts depending
on their ambition level to provide service availability and become Service
Availability-aware (SA-aware).

Third-Party Programs (3PPs) or legacy applications are applications that exist


and that are not integrated with the AMF. Such applications are referred to as
non-SA-aware. Certain features exist in the AMF to support integration of these
types of applications. Such integration is important to provide a complete highly
available system solution that includes databases and storage solutions.

The AMF environment is a clustered environment and the application can be


distributed in a cluster. The subparts of a distributed application do either or
both of the following:

— Share resources such as a file system or a database.

— Communicate with each other to provide the high-level service.

An AMF application can consist of only a single operating system process but this
gives quite a bit of overhead because of the AMF modeling requirements. It is,

3/1553-APR 901 0444/4 Uen B | 2016-11-28 3


AMF User Guide

however, a good starting point when there are plans to make the application High
Availability (HA) or distributed, or both.

2.3 Cluster and Node


Cluster and Node are logical entities of the AMF system model. An AMF node
corresponds to an operating system instance. The set of AMF nodes form the AMF
cluster. Nodes in a cluster belong to the same communication subnet; no routing is
needed within a cluster.

2.4 Component and Service Unit


The component is one logical entity of the AMF system model. A component
represents a program in execution under control of the AMF. Usually a component
corresponds 1:1 to an operating system process.

The term SA-aware component is used to describe a component that is integrated


with and using the API.

Components are grouped into Service Units (SUs), a logical entity completely
associated with an AMF node. All components in an SU execute on the same
AMF node.

2.5 Health Monitoring


Health monitoring is important to achieve service availability and is used to detect
errors and anomalies in the system. Monitoring is always performed on a per
component basis and is also called component monitoring.

The AMF supports three different types of monitoring:

— Passive

— External active

— Internal active

With active monitoring, latent faults, such as a looping and not responding
program, can be detected, which is not the case using passive monitoring.

When active monitoring is used, it is also possible to validate the data received
from the service monitored. For example, if system uptime is requested from an
SNMP agent (because of active monitoring of it), the result can be validated and
checked to see if it is reasonable. This kind of monitoring is out of the scope of the
AMF and this document, besides it is service-specific. If used, it gives even higher
service availability because another class of errors can be detected.

4 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Basic Concepts

The recovery action taken by the AMF when a fault has been detected is
configurable but can, for example, be COMPONENT_RESTART. If a monitored process
dies, it is restarted again by the AMF. A recommended recovery action can also be
specified in the API used to report errors.

For more information, refer to Section 3.10 in Reference [1].

2.5.1 Passive Monitoring


In passive monitoring, the AMF uses operating system features to assess the
health of a component. Currently only monitoring the death of a process is defined
but one can envision monitoring other system resources like main memory use.

As operating system features are used, the component is not actively involved
in the monitoring and its code is not instrumented, hence the name passive
monitoring.

The AMF implicitly performs passive monitoring on SA-aware components. If


such a component dies, for example because of segmentation fault, the AMF
automatically detects it.

To use passive monitoring for other types of components (or for a subprocess),
it must be started using function saAmfPmStart() and stopped using function
saAmfPmStop().

The time between fault and detection is implementation-specific and cannot be


configured using either the API or through configuration objects.

2.5.2 External Active Monitoring


In external active monitoring the component code is not instrumented, instead an
external entity called a monitor is used to assess the health of the component.

The monitor is preferably sending real service requests to the monitored


component and supervising that a correct response is received in a timely manner.

An AMF component can be configured with optional Application Monitoring (AM)


commands. Command AM_START is called after the instantiate command and
AM_STOP is called before the terminate command.

AM_START starts a monitor process that periodically assesses the health of the
monitored application by making a simple service request to it. The AMF is not
involved in the actual monitoring, that is, the responsibility of the monitor process.

When the monitor detects a health problem with its monitored service, it
is to call function saAmfComponentErrorReport() . This implies that the
monitor itself is written in C/C++ or that a helper command exists that wraps
saAmfComponentErrorReport() so that it can be called by a script implemented
monitor.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 5


AMF User Guide

In this case no one monitors the monitor, but as the monitor is simple and small
it can probably be considered fault free by review. If this is not appropriate, the
monitor can be implemented as an AMF SA-aware component to which the AM
commands send monitoring requests.

For more information about this feature, refer to Sections 4.8–4.10 in Reference
[1].

2.5.3 Internal Active Monitoring


Using internal active monitoring, the component must be specifically designed.
The purpose of such code is to monitor the component health and discover latent
faults. The execution of such code (often called audits) is in the AMF called a
health check.

As the code is instrumented, this type of monitoring is normally only used for
SA-aware components.

A health check can be triggered by the component itself or by the AMF. When
triggered by the AMF, health check requests are sent periodically to the component
with a certain configurable period. The AMF expects a response within a certain
configurable time called the duration. The duration is always shorter than the
period.

A component can have several health checks active at the same time. Each health
check is identified by a key – a name. Some reasoning for this: depending on the
check performed, the impact on the service provided varies. A normal service
request has little impact and can be run with a shorter period. More detailed
component audits can have more service impact and are to be run with a longer
period.

Active monitoring means that the provided service is to be checked. Therefore,


health checks cannot be acted on by, for example, a separate decoupled thread in
the component, unless it actually does a service request internally.

Configuration of period (and duration) must be done with high load in mind. It is a
trade-off between fast true error detection and avoidance of false error detection.
A longer period is good to avoid false error detection but it takes longer to detect
latent faults. A health check period is normally in the second range or even 10 s of
second range, it is most likely not less than a second. The health check duration
most likely must be longer than the callback time-out, typically twice as long. It
depends on the AMF implementation if two supervision timers run at the same
time or if health checks are skipped when some other supervision is active, for
example, callback time-out.

An unexpected death of the registered process for an AMF component is instantly


detected by the AMF and requires no active monitoring.

Errors are reported to the AMF in two ways. When the AMF invoked health checks
are used, a negative response is given using function saAmfResponse() . When

6 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Basic Concepts

component invoked health checks are used, the component responds with a
negative response using function saAmfHealthcheckConfirm().

For more details, refer to Section 7.1.2 in Reference [1].

2.6 Workload
A normal non-AMF-aware program provides service directly when started. There
is no distinction between the program and the service it provides. However, if the
service or work the program performs can be categorized and quantified, it can
also be modeled and managed. This categorized and quantified work/service is
what the AMF means by workload. Workload is a core concept used by the AMF to
enable high availability and is important to understand. When an application uses
the workload concept, the AMF enables for sophisticated redundancy schemes.

An application designed with the workload separation in mind is called SA-aware


in AMF terms. That is, it can be started and be ‘‘idle’’ – do nothing until the AMF
tells it to be active or standby for a certain workload.

A simple example can be a web server that starts and initializes but does not bind
to port 80 until assigned the corresponding active workload. On another node,
the same program can be running as standby waiting to be activated if the other
instance goes down. This is an example of a simple 2N redundancy scheme.

With AMF concepts, the workload is called a Service Instance (SI) and these are
assigned to SUs. An SI is further broken down in to Component Service Instances
(CSIs), which are assigned to components (processes) and visible in the API for
the program designer.

2.7 Assignment
The AMF assigns a workload in active or standby state to an application. This
means that the application upon receiving the assignment is to start providing
service according to the state of the assignment, and the amount and type of
service as described by the workload.

For simplicity, the application is often designed so that, when assigned an


active workload, it already knows the amount and type of service the workload
represents. In the web server example, the active workload means bind to port
80. But if the bind port number can vary, the AMF concepts for categorizing the
workload can be used.

More complex schemes can be used to describe workloads. For example, a


workload can describe a range of subscribers and their certain properties. Then
one can imagine some application ‘‘workers’’ collectively providing the high-level
service, each worker contributing with its little piece and all workers together
provides the complete service.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 7


AMF User Guide

2.8 Failover and Switchover


In this section, the ‘‘operator’’ can mean either a human or a management
application running within the system such as software management.

Failover means an unexpected reassignment (from an operator point of view) of a


workload to another instance of an application. In AMF terms, the SI is reassigned
to another SU. Failover is always a consequence of a fault in the system of which
the AMF is aware.

Switchover means an expected reassignment of a workload to another instance of


an application. It is expected because it is either initiated by an operator or by
the AMF itself. When recovering from a fault, the AMF can fail over some SIs and
switch over others. This occurs in some conditions, always as a consequence
of a fault and depending on the application model and configuration. This is to
minimize disturbance in the system.

A switchover is supposed to be less intrusive to the service provided by the


application. SA-aware components are to be designed with this objective.

2.9 Error Detection, Recovery, Repair, and Escalation


Error detection is the responsibility for all entities in the system.

After an error has been detected and reported, the AMF tries to recover the
application provided service from the error. Recovery is performed automatically
by the AMF to ensure that all assignments are reassigned to a non-erroneous
component. If the AMF cannot reassign the workload, it sends the alarm
‘‘workload unassigned’’, which means that a service is not available at all.

A recovery action can be recommended when an error is reported. A default action


is also configured for the component. The executed action is never weaker than
the one recommended but can be stronger.

Normally the first level of recovery is restart of the erroneous component.


The objective is to avoid reassigning the workload to another component. If
component restart fails or another error occurs within the component probation
time, the next action – because of escalation – is restart of the whole SU.

If the SU is restarted too many times during the SU probation time, the recovery
action is escalated to failover.

If restart is disabled by configuration or the restart failed, the next level of


recovery action – because of escalation or recommendation – is failover. This
means assigning the workload to another SU than the failed component belongs
to. The failover scope can because of escalation be extended from SU to node (all
SUs hosted by a node).

After recovery, repair is by default automatically performed on the erroneous


entity. By configuration, automatic repair can be disabled and thus make the

8 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Basic Concepts

responsibility a non-AMF issue. Restart recovery actions are considered as repair


actions and no further action is needed. However, if the recovery action was
failover, the AMF tries to reinstantiate the component and possibly reassign it.

For more details, refer to Section 3.11 in Reference [1].

2.10 Information Model


An SA-Forum system is managed through an information model. The information
model consists of managed objects that represent various logical entities in the
system.

The information model is managed by the Information Model Management


service (IMM). It is out of scope of this document to describe the IMM; for more
information, refer to Reference [2].

Most SA-Forum specified services defines an information model. This is


particularly true for the AMF that defines a rich information model to support
application modeling; for more information, refer to Section 8 in Reference [1].

The IMM supports administrative operations, which can be seen as a Remote


Procedure Call on an object in the model. An operator, for example, stops an
application because it is about to be upgraded.

An application can also use the IMM to store its specific configuration data, thus
making it possible to configure and manage in SA-Forum intended way.

2.11 Redundancy Model


The AMF provides the concept of redundancy models. The redundancy model
helps the AMF to keep the application service available per its requirements.

Historically telecommunications applications have been designed to have standby


entities. The AMF support those types of applications by providing redundancy
models that include standby workload assignments. Other SA-Forum services,
such as the Checkpoint service (CKPT), provide means to make standby entities
‘‘warmer’’ – more ready to take over an active assignment. The CKPT enables an
application to replicate its state data.

By leveraging on the separation of program and workload, the AMF can


manage many instances of a program and transfer the active workload from a
non-operational program to an operational program.

The following redundancy models are defined:

— 2N

— N+M

— N-way

3/1553-APR 901 0444/4 Uen B | 2016-11-28 9


AMF User Guide

— N-way active

— NoRedundancy

For more information, refer to Section 3.6 in Reference [1].

2.12 Administrative Operations


The AMF model specifies quite a few administrative operations defined for certain
entities. The AMF is the implementer of such a call with help and cooperation from
the affected application component or components.

Examples of administrative operations are LOCK and UNLOCK for workload


management, but other operations also exist.

Administrative operations are needed so that an operator can communicate and


control the AMF. For example, upgrading an AMF application without involving
the AMF causes the AMF to consider the application to have failed.

Administrative operations are used by an operator or more likely a management


program acting on behalf of an operator at a network management system.

One example of the latter is software management. When a program is upgraded,


it is locked, updated, and finally unlocked again.

For more information, refer to Section 9 in Reference [1].

For a description, with sequence diagrams, of the AMF interaction with a


component using callbacks, refer to Section 10 in Reference [1].

10 3/1553-APR 901 0444/4 Uen B | 2016-11-28


System Description and Model

3 System Description and Model

To represent resources under its control, the AMF uses an abstract system model
consisting of various logical entities. This model is needed to describe the system
model in a way the AMF understands. The AMF cannot manage an application
unless a corresponding model has been configured.

Most of the AMF logical entities are software entities. This means that they are
used to describe the instances of software execution under the AMF control
and the management policies and relationships between them. For example,
components represent executing programs while the SU describes relationships
(containment and dependencies) between components and the recovery policy
used when an error has been detected.

For an overview of the logical entities, refer to Figure 1 and Section 3.1 in
Reference [1].

3.1 Entity Types


The AMF system model was at some point enhanced with the concept of a
software entity type or only type. This was mainly done to support software
management but also simplifies configuration of an application.

Similar software entities are generalized into a versioned entity type. These are
of a certain base entity type. A base entity type can be visualized as an empty
base class, only needed to host versioned entity types. It does not contain any
configuration attributes.

The concepts and relationships together with an example of a software entity is


shown in Figure 1. Class names are in bold followed by an example Distinguished
Name (DN).

3/1553-APR 901 0444/4 Uen B | 2016-11-28 11


AMF User Guide

Concepts Example

Base Entity Type SaAmfCompBaseType


safCompType=X

Is Of

Versioned Entity SaAmfCompType


Type safVersion=1.0,safCompType=X

Realizes

Software Entity SaAmfComponent SaAmfComponent


safComp=X,safSU=SU1,... safComp=X,safSU=SU2,...

Figure 1 Software Entities and Types

The reason why types simplify configuration is because common attributes can be
gathered in the versioned entity type. Imagine a system with many instances of
the same component. Less need to duplicate information, the better.

3.2 Software Entities


Software entities are used to model an AMF application.

All software entities are of a certain versioned entity type. This relationship
is defined by an attribute in the software entity. For example, an instance of
the SaAmfComponent class uses attribute saAmfCompType to describe of what
versioned entity type it is.

The AMF B.04 system model can at a first glance feel and look overwhelming
with all its classes. But only 10 out of 33 classes are directly used when modeling
an application. The remaining 23 classes are entity types, runtime classes, and
non-software entities (such as nodes).

The following sections provide an introduction to the main software entities in


the AMF system model. The AMF class name is presented within parenthesis
after the logical name.

For the AMF instance view with relationships, refer to Figure 29 in Reference [1].

12 3/1553-APR 901 0444/4 Uen B | 2016-11-28


System Description and Model

3.2.1 Component (SaAmfComponent)


A component ‘‘represents a set of application resources managed as one entity’’.
The component encapsulates application-specific functionality.

The component can be seen as an AMF abstraction of a computer program on


which the AMF performs certain actions such as start and stop. A started AMF
component typically maps to an operating system process.

A component is the smallest entity that error detection, recovery, and repair are
performed on. Components have a state model where specifically the presence
state reflects the life cycle.

Components can either be integrated with the AMF (SA-aware component) or not
(non-SA-aware component).

Components integrated with the AMF use the API and are aware/designed for the
workload concept. For a code example of such a component, refer to Appendix X
in Reference [1].

3.2.2 Service Unit (SaAmfSU)


A Service Unit (SU) is a grouping of several components that together provide a
service, see Figure 2. An SU is configured to be hosted by a node. It can also be
configured to be hosted by a group of nodes. If so, the AMF takes care of allocating
the SU to a specific node. How the AMF does this is implementation-specific, but
one can imagine some load balancing schema such as a round robin allocation
of SUs to nodes. The SU is the lowest level on which administrative operations
regarding workload management is done.

Configuring an SU to a node group makes it application-independent of the node


naming scheme.

Service Unit

Component
Component
Component

Figure 2 Service Unit

3/1553-APR 901 0444/4 Uen B | 2016-11-28 13


AMF User Guide

3.2.3 Service Group (SaAmfSG)

The AMF manages redundant SUs to ensure service availability if there are
failures. A Service Group (SG) is a logical entity that groups several SUs, see
Figure 3. The SG protects one or more SIs. An SG has a corresponding redundancy
model that defines how the SUs are used to provide service availability. SUs are
hosted on different nodes in the cluster.

Service Group

Service Unit X Service Unit Y

Component Component
Component Component
Component X1 Component Y1

Figure 3 Service Group

3.2.4 Application (SaAmfApplication)


The application object is the top level of an application. All other application
entities (except types) are children to the application object.

The application entity groups one or more SGs to provide a higher-level service,
see Figure 4.

Application

Service Group X Service Group Y

Component Component
Component Component
Service Unit X1 Service Unit Y1

Figure 4 Application

14 3/1553-APR 901 0444/4 Uen B | 2016-11-28


System Description and Model

3.2.5 Component Service Instance (SaAmfCSI)


A Component Service Instance (CSI) represents a quantified and categorized
workload that can be assigned to a component. The CSI assignment has an
associated HA state. It can be assigned ACTIVE to one component and STANDBY to
one or more other components.

CSIs are quantified and categorized by its name and an extra modeling object of
class SaAmfCSIAttribute. Attributes are name=value pairs that describe the
workload in a way understandable for a component.

When a component is assigned a CSI with the callback, configured attributes are
passed.

3.2.6 Service Instance (SaAmfSI)

One or more CSIs are grouped into a Service Instance (SI), see Figure 5.

The SI represents a workload that can be assigned to an SU. The SI is said to


be protected by an SG, as the AMF dynamically decides which SU in the SG is
assigned the SI. The assignment can change during the life cycle of the system.
For example, when a node goes down, the AMF assigns those SIs to other SUs on
other nodes.

Service Instance

Component
Component
Component
Service
Instance

Figure 5 Service Instance

3.3 Other Entities

3.3.1 Cluster (SaAmfCluster) and Node (SaAmfNode)

An AMF node corresponds to an operating system instance, an execution


environment. If virtualization is considered, a node is not necessarily the same
as a Central Processing Unit (CPU) blade. Some AMF-controlled entities are
allocated to or related with a node.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 15


AMF User Guide

The set of nodes defines the AMF cluster. During the life span of a system, the
cluster membership changes as nodes join and leave the cluster. Reasons for a
changing membership can be as follows:

— A node is restarted (as a repair action).

— A node is administratively shut down (by an operator).

— A hardware fault exists.

— The cluster is extended with one more node.

The AMF operates on a single cluster. The number of nodes can vary from at least
one to many. The middleware is responsible for managing these objects.

SAF also specifies a Cluster Membership (CLM) cluster and nodes. The AMF nodes
are mapped to CLM nodes. It is out of scope of this document to describe this any
further. For more information, refer to Section 3.1.1.1 in Reference [1].

For information about the relationships for node and cluster, refer to Figure 27
in Reference [1].

3.3.2 Node Group (SaAmfNodeGroup)


An AMF node group is an unordered set of nodes of the same type (hardware and
operating system). A node group can be used for (at least) two purposes:

— Map application software to nodes.

— Enable software upgrade campaigns to ‘‘roll over’’ nodes in the group.

The middleware probably comes with a few node groups predefined. Applications
can also create their own node groups to simplify tasks such as a complex upgrade
scenario.

3.3.3 Global Component Attributes (SaAmfCompGlobalAttributes)


SaAmfCompGlobalAttributes is a singleton object used to configure some
common attributes used by all components.

3.3.4 Node Software Bundle (SaAmfNodeSwBundle)


The SaAmfNodeSwBundle class defines the root installation directory of a
particular installed software bundle. It is used to construct the complete path of a
Component Life Cycle - Command-Line Interface (CLC-CLI) command. Instances
of this class are created when the Software Management Framework (SMF)
installs a software bundle on a particular node.

16 3/1553-APR 901 0444/4 Uen B | 2016-11-28


System Description and Model

3.4 State Models


Most software entities in the AMF information model have some kind of state
or several states. This section provides an introduction to the main states
administrative, operation, presence, and HA.

More states are defined, refer to Section 3.2 in Reference [1].

3.4.1 Administrative State


The administrative state reflects the operator’s permission or prohibition for an
entity to provide service. In the simplest case, if it allowed to be started or not.
The state is defined for the software entities: cluster, application, SG, SU, and SI.

The following values are defined:

UNLOCKED The entity is allowed to provide service.

LOCKED The entity is not allowed to provide service.

LOCKED-INSTANTIATION
The entity is not allowed to be started (instantiated).

SHUTTING-DOWN
A transitional state where the service is gracefully shut
down, when done the state becomes LOCKED.

Transitions between administrative states are done with administrative


operations on the corresponding logical entity. The following are the logical
names for the administrative operations:

UNLOCK An order to transition to the UNLOCKED administrative


state. SA-aware components are assigned component
SIs. Non-SA-aware components are instantiated.

LOCK An order to transition to the LOCKED administrative


state. CSIs are removed from SA-aware components.
Non-SA-aware components are terminated.

LOCK_INSTANTIATION
An order to terminate the affected components and
transition to LOCKED-INSTANTIATION administrative
state.

UNLOCK_INSTANTIATION
An order to instantiate the affected components and
transition to LOCKED administrative state. Has no effect
on non-SA-aware components.

For more information, refer to Section 3.2 and Section 9.4 in Reference [1].

3/1553-APR 901 0444/4 Uen B | 2016-11-28 17


AMF User Guide

3.4.2 Operational State

The operational state reflects the ability of a logical entity to provide service. The
state can be seen as the entities error status. If no error exists that prevents the
entity to provide service, its operational state is ENABLED.

This state is defined for node, SU, and component.

If any error exists that makes it impossible for the entity to provide service, the
operational state is DISABLED. For example, if a node is rebooted, all SUs mapped
to the node are DISABLED while the node is down.

The operational state is not related to the administrative state. The operational
state can be DISABLED but the administrative state is UNLOCKED. This is the case if
a node goes down because of a hardware error.

For more information, refer to Section 3.2 in Reference [1].

3.4.3 Presence State


The presence state reflects the life cycle of components. The state is also defined
for SUs but in that case reflects the aggregation of its contained components.

The state can have the following values:

UNINSTANTIATED
The component is not started.

INSTANTIATING The component is starting.

INSTANTIATED The component has started and registered.

TERMINATING The component is ordered to terminate.

RESTARTING The component is restarting as a recovery action or


administrative operation.

INSTANTIATION-FAILED
Failed state when instantiation has failed.

TERMINATION-FAILED
Failed state when termination has failed.

When a component enters the FAILED state, an alarm is produced by the AMF.

3.4.4 HA State

An SI assigned to an SU has an associated HA state. The HA state can take the


following values:

18 3/1553-APR 901 0444/4 Uen B | 2016-11-28


System Description and Model

ACTIVE The SU is actively providing service.

STANDBY The SU is not providing service but acting as a standby.

QUIESCING The SU is gracefully transitioning from ACTIVE to


QUIESCED.

QUIESCED The SU has stopped providing service and the SI can be


safely assigned to another SU.

The HA state is reflected by instances of class SaAmfSIAssignment in the


information model. For components, it is reflected by instances of class
SaAmfCSIAssignment.

3.5 Dependencies

3.5.1 Workload

A workload or SI can be configured to depend on another SI. For example, if SI1


depends on SI2, SI1 is not assigned active until SI2 has been assigned active. This
type of dependency is cluster-wide but typically within an application. The SI-SI
dependency is configured with an instance of a SaAmfSIDependency object.

A CSI can depend on other CSIs in the same SI. The dependencies one
particular CSI has to other CSIs is configured with the multi-value attribute
saAmfCSIDependencies in the CSI configuration object. This attribute is not a
list (order implied) as specified, it is an unordered set.

For more information, refer to Section 3.8.1 in Reference [1].

3.5.2 Components
The AMF allows configuring an instantiation level for components to model
dependencies between components in the same SU. The AMF instantiates and
terminates components according to this level.

For more information, refer to Section 3.8.2 in Reference [1].

3.6 Ranking
SUs and SIs can be ranked. A rank is a positive integer (>0), the lower value
the higher rank. For example, an SU with rank=1 is higher ranked compared to
another SU with rank=2.

A higher rank (lower integer value) for an SU means that it is assigned before
other SUs. A higher rank for an SI means that it is selected for assignment before
other SIs.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 19


AMF User Guide

Auto-adjust is an AMF feature that causes the AMF to return assignments of an


SG to the most preferred assignments. When doing this, the ranks come into
picture. This feature is by default disabled and must be explicitly enabled in the
SG and its type.

An SI can also be specifically ranked to a particular SU by configuration


object SaAmfSIRankedSU. This can be used in redundancy models N-way and
N-way-active and in that case has preference over the SU rank when assigning SIs.

3.7 Component Life Cycle Controlling Commands


The CLC-CLI behind this concept is nothing more than a start/stop script. The AMF
controls the life cycle of a component by interfacing with a corresponding script.

The following are the normal commands used for components:

INSTANTIATE Start the component.

CLEANUP Used when recovering from errors and to clean up after


a graceful termination. This command is to kill the
component hard with a KILL signal.

Must succeed to guarantee consistency of the AMF


system.

TERMINATE Terminate a non-SA-aware component (not used for


SA-aware components). This means a graceful shutdown,
perhaps sending a TERM signal that can be caught and
handled.

SA-aware components are gracefully terminated by the AMF using the terminate
callback. The cleanup script is run afterwards to clean up temporary files such as
Process ID (PID) files created when the starting the component or when an error
has been detected such as termination failed.

As non-SA-aware components by definition do not use the AMF API, they are
terminated gracefully by command TERMINATE.

The script and its arguments are specified in the component instance or in the
component type (as they are common between instances).

The script must be able to control a process, for example, stop it. It is recommended
to use a PID file for that purpose. The component process is to create the PID file
when it has started successfully. If the AMF wants to clean up a component, it
calls the script, the PID file is read, and a KILL signal is sent to the process.

It is recommended to use the Linux® Standard Base (LSB) helper functions to


write these scripts. If it is needed to have many instances of the same component
on the same node, the PID files must have names associated with the component
and not only the de facto standard <service>.pid filename. For example, the
component name can be used. It is available as environment variable to the script.

20 3/1553-APR 901 0444/4 Uen B | 2016-11-28


System Description and Model

For more information, refer to Section 4 in Reference [1].

For a sample CLC-CLI script, refer to Reference [4].

3/1553-APR 901 0444/4 Uen B | 2016-11-28 21


AMF User Guide

22 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Application Programming Interface

4 Application Programming Interface

The AMF API is simple but it requires knowledge of the model and concepts. The
API is mainly relevant only for SA-aware components integrated with the AMF,
but parts of the API are useful for small command/tools.

The AMF issues requests to a component by invoking component-specific


callbacks. These callbacks are provided by the component when it registers with
the AMF. The callbacks are executed as a consequence of the component calling a
dispatch function in the API.

Basically a component does some up front initialization and after that waits
for events on an AMF provided file descriptor. When such an event occurs, it is
dispatched and the requests serviced as callbacks.

The main use of the API is best described with some high-level pseudo code (no
error handling):

3/1553-APR 901 0444/4 Uen B | 2016-11-28 23


AMF User Guide

main()
{
// Initialize my service
myservice_initialize()
// Initialize with AMF
callbacks.healthcheck = my_healthcheck_cb
callbacks.csiset = my_csiset_cb
callbacks.csiremove = my_csiremove_cb
callbacks.terminate = my_terminate_cb
handle = saAmfInitialize(callbacks)

// Get my name (DN)


name = saAmfComponentNameGet(handle)

// Register with AMF


saAmfComponentRegister(handle, name)

// Start AMF invoked Internal Active Monitoring


saAmfHealthcheckStart(handle, key, AMF_INVOKED)

// Get file descriptors to wait on


amf_fd = saAmfSelectionObjectGet(handle)
service_fd = myservice_fd_get()

// for ever event loop and dispatch incoming events


while (1)
poll(amf_fd, service_fd)
if (events on amf_fd)
saAmfDispatch()
if (events on service_fd
myservice_dispatch()
}
my_healthcheck_cb()
{
if (myservice_request() is OK)
saAmfResponse(OK)
else
saAmfResponse(NOK)
}
my_csiset_cb(csi, role)
{
if (role is ACTIVE)
myservice_activate()
if (role is STANDBY)
myservice_deactivate()

saAmfResponse(OK)
}
my_csiremove_cb()
{
saAmfResponse(OK)
}
my_terminate_cb()
{
myservice_shutdown()
exit(SUCCESS)
}

Comments on the example:

— The AMF invoked health checks are executed as callbacks when


saAmfDispatch() is called, in context of the main thread. The main thread is
serving both the AMF and real incoming service requests. This means that a
health check request is also testing that the main logic of the program works.

— Assignments and changes in them are received from the AMF as callbacks as
a consequence of calling saAmfDispatch().

24 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Application Programming Interface

— The AMF requested that graceful termination is executed in a callback as a


consequence of executing saAmfDispatch().

— When using other SA-Forum-defined services like the IMM, it fits nicely into
this program structure because the callback mechanism is the same between
most SAF services.

— The process forever loops in an event loop listening for events on file
descriptor. This is a common design pattern for a server program.

For a complete C code example, refer to Reference [3].

For its accompanying CLC-CLI script, refer to Reference [4].

3/1553-APR 901 0444/4 Uen B | 2016-11-28 25


AMF User Guide

26 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Integration of Legacy Software

5 Integration of Legacy Software

5.1 Integration Approaches


Legacy software entities can be integrated with the AMF using the following
different approaches:

— If the legacy software is internal property, its code can be modified. If done in
an elegant way, the same application is to be possible to use in both an AMF
system environment and in its original system environment.

— Use an SA-aware wrapper component. The wrapper component and


the legacy software together form one AMF component. The wrapper is
responsible for life cycle management and to perform health checks on the
‘‘wrapped’’ service.

— Use a proxy component to manage the legacy software, which in this case is a
separate ‘‘proxied’’ AMF component. The proxy solution is appropriate when
the redundancy model of the legacy software differs from the proxy entity.

— Model as a non-proxied non-SA-aware component. The AMF is limited to life


cycle management, but external active monitoring and passive monitoring
can be used to achieve HA characteristics.

For a complete wrapper example, refer to Reference [5].

5.2 Types of Applications

5.2.1 Simple Applications

This category contains simple programs not integrated with any middleware. Such
a program provides service directly when started. Either one program instance
provides the complete service or many instances provide the same service with
more capacity. An example can be a web server. Instances can run on many nodes
as long as they all have access to the same file system. Adding an instance only
means that more service requests per second can be serviced (a bit simplified).

5.2.2 Complex Applications Including Availability Management


This category includes databases and application servers that include their
own availability management or clustering, or both. Different functions of the
application can be controlled independently.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 27


AMF User Guide

5.3 Recommendations
The wrapper component integration approach is recommended for the ‘‘simple’’
type of application. Reasons are that the wrapper logic is much simpler than the
proxy/proxied variant. Also the AMF model is simpler with only one component
that models both the wrapper and the ‘‘wrapped’’ component.

The non-proxied, non-SA-aware component approach is recommended to use for


simple script-based one-shot tasks. An example use can be moving an IP address
in a controlled and synchronized (with other components) fashion.

The proxy/proxied integration approach is recommended for the ‘‘complex’’ type


of application.

28 3/1553-APR 901 0444/4 Uen B | 2016-11-28


General Concerns

6 General Concerns

6.1 Daemonizing
The AMF components are usually long lived processes, at least the SA-aware
ones. They are started when the system is bootstrapped and must behave as
other daemons in the Unix® world.

More information can be found on the Internet but consider the following:

— Detach from the controlling terminal by forking.

— Drop privileges.

— Close all open files including standard streams (stdout, and so on).

— Change working directory to root (/).

— Use a log service for error logging.

— Create a PID file (also known as lock file) for use by the controlling script.

Dropping privileges is mentioned here because it is an important security


measurement. A program is to run with as little privileges as possible.

6.2 Logging
A daemon process is not to use printf type of output to file. One reason is that
such files normally contain no or a non-standard time stamp. Log rotation is also
required in a long running system.

It is recommended to use the syslog service in the operating system or the


SA-Forum Log service, or both, perhaps using an application-specific stream.

High-level application logging can, for example, go SAF Log and more detailed
processor local logging to syslog.

6.3 Error Handling


Errors detected in the context of component health monitoring have already
been discussed. What remains are errors detected by a component outside the
component monitoring context. Depending on the perceived severity for such
errors, the component can choose to do either of the following:

— Log and continue.

— Report the error to the AMF and choose either of the following:

3/1553-APR 901 0444/4 Uen B | 2016-11-28 29


AMF User Guide

• Wait for the AMF to take action.

• Continue business as usual until the AMF takes action.

— Exit the process.

Whatever action taken, detailed error logging for further troubleshooting is


important. For example, if a system call fails unexpectedly, ensure to log the
calling context and the errno value or its logical representation.

6.4 Standards Compliance


It is recommended to comply with existing standards for increased portability
between different Linux distributions or even Unix dialects.

Such standards are POSIX® and LSB.

6.5 File System Layout


It is recommended to adhere to the Filesystem Hierarchy Standard (FHS)
specification where to place files in the file system for increased portability
between Linux distributions.

If multiple component instances each of a different version must coexist, a version


number is required in the top directory for the application. The package name
(RPM name) is also to include the version number.

6.6 User Management


An application is typically to create its own Unix group and user when the
corresponding package is installed.

When the application process is started, it is to change group and user accordingly
– drop its privileges.

Unix groups and users are normally not deleted when removing a package but
manually by an operator. In the case of BaseMW that corresponds to a remove
campaign.

30 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Building and Packaging

7 Building and Packaging

7.1 Building
Building an AMF application is simple. From a C/C++ source file, include the AMF
header file saAmf.h and, when linking the program, link against the SaAmf library.

7.2 Packaging
An AMF application must be packaged in the native packaging format as
supported by the underlying Linux distribution, for example, RPMs. It is important
to remember that the AMF controls the life cycle of the program, not the Linux
init process.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 31


AMF User Guide

32 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Installing, Upgrading, and Removing an Application

8 Installing, Upgrading, and Removing an


Application

An AMF application is normally installed by executing an SMF installation


campaign. It is out of scope of this document to describe how such a campaign is
created. However, most likely tools exist that can create such a campaign based
on the information in the application provided Entity Types File, see Section 8.1
Entity Types File on page 33.

Upgrade and remove are also normally done using SMF campaigns and it is
again out of scope of this document to describe how this is done. For upgrade
campaigns, tools exist that can create such a campaign based on the current
configuration and the wanted configuration.

It is clearly possible to bypass the SMF and use the IMM directly for configuring
the application model in the IMM. However, it is not the official way of doing it
and is only mentioned here for completeness.

8.1 Entity Types File


The entity types file ETF.xml is specified by the SMF. However, it is related to the
AMF, as it contains AMF entity prototypes. These prototypes contain constraints
regarding the AMF configuration values and attributes, and provide information
necessary to configure and deploy an application on a system.

The file is specified in XML and is included in the software bundle (package).

The ETF.xml file contains information needed by offline tools to generate upgrade
campaigns.

3/1553-APR 901 0444/4 Uen B | 2016-11-28 33


AMF User Guide

34 3/1553-APR 901 0444/4 Uen B | 2016-11-28


Reference List

Reference List

[1] AIS AMF Specification, http://www.saforum.org/Download-the-SA-Forum-


Specifications~217409~16627.htm

[2] AIS IMM Specification, http://www.saforum.org/Download-the-SA-Forum-


Specifications~217409~16627.htm

[3] amf_demo.c, http://devel.opensaf.org/hg/opensaf-staging/file/fbaa27285


81a/samples/amf/sa_aware/amf_demo.c

[4] amf_demo_script, http://devel.opensaf.org/hg/opensaf-staging/file/af6ad


0bcf66c/samples/amf/sa_aware/amf_demo_script

[5] wrapper example, http://devel.opensaf.org/hg/opensaf-staging/file/fbaa


2728581a/samples/amf/wrapper

3/1553-APR 901 0444/4 Uen B | 2016-11-28 35

You might also like