Sai Ais Amf B.03.01
Sai Ais Amf B.03.01
Sai Ais Amf B.03.01
The Service AvailabilityTM solution is high availability and more; it is the delivery of ultra-dependable communication services on demand and without interruption. This Service AvailabilityTM Forum Application Interface Specification document might contain design defects or errors known as errata, which might cause the product to deviate from published specifications. Current characterized errata are available on request.
10
1. LICENSE GRANT. Subject to the terms and conditions of this Agreement, Licensor hereby grants you a nonexclusive, worldwide, non-transferable, revocable, but only for breach of a material term of the license granted in this section 1, fully paid-up, and royalty free license to: a. reproduce copies of the Specification to the extent necessary to study and understand the Specification and to use the Specification to create products that are intended to be compatible with the Specification; b. distribute copies of the Specification to your fellow employees who are working on a project or product development for which this Specification is useful; and c. distribute portions of the Specification as part of your own documentation for a product you have built, which is intended to comply with the Specification. 2. DISTRIBUTION. If you are distributing any portion of the Specification in accordance with Section 1(c), your documentation must clearly and conspicuously include the following statements: a. Title to and ownership of the Specification (and any portion thereof) remain with Service Availability Forum ("SA Forum"). b. The Specification is provided "As Is." SA Forum makes no warranties, including any implied warranties, regarding the Specification (and any portion thereof) by Licensor. c. SA Forum shall not be liable for any direct, consequential, special, or indirect damages (including, without limitation, lost profits) arising from or relating to the Specification (or any portion thereof). d. The terms and conditions for use of the Specification are provided on the SA Forum website. 3. RESTRICTION. Except as expressly permitted under Section 1, you may not (a) modify, adapt, alter, translate, or create derivative works of the Specification, (b) combine the Specification (or any portion thereof) with another document, (c) sublicense, lease, rent, loan, distribute, or otherwise transfer the Specification to any third party, or (d) copy the Specification for any purpose. 4. NO OTHER LICENSE. Except as expressly set forth in this Agreement, no license or right is granted to you, by implication, estoppel, or otherwise, under any patents, copyrights, trade secrets, or other intellectual property by virtue of your entering into this Agreement, downloading the Specification, using the Specification, or building products complying with the Specification. 5. OWNERSHIP OF SPECIFICATION AND COPYRIGHTS. The Specification and all worldwide copyrights therein are the exclusive property of Licensor. You may not remove, obscure, or alter any copyright or other proprietary rights notices that are in or on the copy of the Specification you download. You must reproduce all such notices on all copies of the Specification you make. Licensor may make changes to the Specification, or to items referenced
15
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
therein, at any time without notice. Licensor is not obligated to support or update the Specification. 6. WARRANTY DISCLAIMER. THE SPECIFICATION IS PROVIDED "AS IS." LICENSOR DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT OF THIRD-PARTY RIGHTS, FITNESS FOR ANY PARTICULAR PURPOSE, OR TITLE. Without limiting the generality of the foregoing, nothing in this Agreement will be construed as giving rise to a warranty or representation by Licensor that implementation of the Specification will not infringe the intellectual property rights of others. 7. PATENTS. Members of the Service Availability Forum and other third parties [may] have patents relating to the Specification or a particular implementation of the Specification. You may need to obtain a license to some or all of these patents in order to implement the Specification. You are responsible for determining whether any such license is necessary for your implementation of the Specification and for obtaining such license, if necessary. [Licensor does not have the authority to grant any such license.] No such license is granted under this Agreement. 8. LIMITATION OF LIABILITY. To the maximum extent allowed under applicable law, LICENSOR DISCLAIMS ALL LIABILITY AND DAMAGES, INCLUDING DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL, AND INCIDENTAL DAMAGES, ARISING FROM OR RELATING TO THIS AGREEMENT, THE USE OF THE SPECIFICATION OR ANY PRODUCT MANUFACTURED IN ACCORDANCE WITH THE SPECIFICATION, WHETHER BASED ON CONTRACT, ESTOPPEL, TORT, NEGLIGENCE, STRICT LIABILITY, OR OTHER THEORY. NOTWITHSTANDING ANYTHING TO THE CONTRARY, LICENSORS TOTAL LIABILITY TO YOU ARISING FROM OR RELATING TO THIS AGREEMENT OR THE USE OF THE SPECIFICATION OR ANY PRODUCT MANUFACTURED IN ACCORDANCE WITH THE SPECIFICATION WILL NOT EXCEED ONE HUNDRED DOLLARS ($100). YOU UNDERSTAND AND AGREE THAT LICENSOR IS PROVIDING THE SPECIFICATION TO YOU AT NO CHARGE AND, ACCORDINGLY, THIS LIMITATION OF LICENSORS LIABILITY IS FAIR, REASONABLE, AND AN ESSENTIAL TERM OF THIS AGREEMENT. 9. TERMINATION OF THIS AGREEMENT. Licensor may terminate this Agreement, effective immediately upon written notice to you, if you commit a material breach of this Agreement and do not cure the breach within ten (30) days after receiving written notice thereof from Licensor. Upon termination, you will immediately cease all use of the Specification and, at Licensors option, destroy or return to Licensor all copies of the Specification and certify in writing that all copies of the Specification have been returned or destroyed. Parts of the Specification that are included in your product documentation pursuant to Section 1 prior to the termination date will be exempt from this return or destruction requirement. 10. ASSIGNMENT. You may not assign, delegate, or otherwise transfer any right or obligation under this Agreement to any third party without the prior written consent of Licensor. Any purported assignment, delegation, or transfer without such consent will be null and void. 11. GENERAL. This Agreement will be construed in accordance with, and governed in all respects by, the laws of the State of Delaware (without giving effect to principles of conflicts of law that would require the application of the laws of any other state). You acknowledge that the Specification comprises proprietary information of Licensor and that any actual or threatened breach of Section 1 or 3 will constitute immediate, irreparable harm to Licensor for which monetary damages would be an inadequate remedy, and that injunctive relief is an appropriate remedy for such breach. All waivers must be in writing and signed by an authorized representative of the party to be charged. Any waiver or failure to enforce any provision of this Agreement on one occasion will not be deemed a waiver of any other provision or of such provision on any other occasion. This Agreement may be amended only by binding written instrument signed by both parties. This Agreement sets forth the entire understanding of the parties relating to the subject matter hereof and thereof and supersede all prior and contemporaneous agreements, communications, and understandings between the parties relating to such subject matter.
10
15
20
25
30
35
40
SAI-AIS-AMF-B.03.01
AIS Specification
Table of Contents
1 Document Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.1 Document Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 AIS Documents Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 New Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.2 Clarifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.3 Superseded and Superseding Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.4 Changes in Return Values of API Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.5 Other Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10
1.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 How to Provide Feedback on the Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.6 How to Join the Service Availability Forum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.7 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7.1 Member Companies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.7.2 Press Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
15
2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 Overview of the Availability Management Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 System Description and System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 Physical Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Logical Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Cluster and Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1.1 AMF Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1.2 AMF Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.1.3 Usage of the Terms Node and Cluster in this Document . . . . . . . . . . . . . . . . . . . . . . . . . . 31
20
25
3.2.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2.1 SA-Aware Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.1.1 Container and Contained Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.2 Non-SA-Aware Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.2.1 External Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.2.2 Non-Proxied, Non-SA-Aware Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.2.3 Integration and Usage of Non-SA-Aware Local Components . . . . . . . . . . . . . . . . . 3.2.2.3 Proxy and Proxied Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.4 Component Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.5 Component Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 34 36 36 36 36 37 38 39
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
3.2.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.7.1 Application Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.8 Protection Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.9 Service Unit Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.10 Illustration of Logical Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
10
15
3.3.4 Component Service Instance States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.3.5 Service Group States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.3.6 Node States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.6.1 Administrative State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.6.2 Operational State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
20
3.3.7 Application States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.8 Cluster States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.3.9 Summary of States Supported for the Logical Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
25
3.4 Fail-Over and Switch-Over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5 Possible Combinations of States for Service Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5.1 Combined States for Pre-Instantiable Service Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5.2 Combined States for Non-Pre-Instantiable Service Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
30
35
40
SAI-AIS-AMF-B.03.01
AIS Specification
3.7.2.3.4 Cluster Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.7.2.3.5 Role of the List of Ordered Service Units in Assignments and Instantiations . . . . . 96 3.7.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.7.2.5 UML Diagram of the 2N Redundancy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10
15
20
25
30
35
40
3.8 Component Capability Model and Service Group Redundancy Model . . . . . . . . . . . . . . 154 3.9 Dependencies Among SIs, Component Service Instances, and Components . . . . . . . . . 155
3.9.1 Dependencies Among Service Instances and Component Service Instances . . . . . . . . . . . . 155
AIS Specification
SAI-AIS-AMF-B.03.01
3.9.1.1 Dependencies Among SIs when Assigning a Service Unit Active for a Service Instance . . . 155 3.9.1.2 Impact of Disabling a Service Instance on the Dependent Service Instances . . . . . . . . . . . . . 156 3.9.1.3 Dependencies Among Component Service Instances of the same Service Instance . . . . . . . 156
3.10 Approaches for Integrating Legacy Software or Hardware Entities . . . . . . . . . . . . . . . . 159 3.11 Component Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 3.12 Error Detection, Recovery, Repair, and Escalation Policy . . . . . . . . . . . . . . . . . . . . . . . 161
3.12.1 Basic Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.12.1.1 Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.2 Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.3 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.3.1 Restart Recovery Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.3.2 Fail-Over Recovery Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.3.3 Application Restart Recovery Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.3.4 Cluster Reset Recovery Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.4 Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.4.1 Recovery and Associated Repair Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.4.2 Restrictions to Auto-Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1.5 Recovery Escalation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 161 162 163 164 166 166 167 168 169 170
10
15
20
4 Local Component Life Cycle Management Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.1 Common Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.2 Configuring the Pathname of CLC-CLI Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 4.3 CLC-CLI Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 4.4 Configuring CLC-CLI Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 4.5 Exit Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 4.6 INSTANTIATE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 4.7 TERMINATE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 4.8 CLEANUP Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 4.9 AM_START Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.10 AM_STOP Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.11 Usage of CLC-CLI Commands Based on the Component Category . . . . . . . . . . . . . . . 184 5 Proxied Components Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.1 Properties of Proxy and Proxied Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.2 Life-Cycle Management of Proxied Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 5.3 Proxy Component Failure Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 40 6 Contained Components Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.1 Overview of Container and Contained Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 25
30
35
SAI-AIS-AMF-B.03.01
AIS Specification
6.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.1.2 Component Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.1.3 Multiple Components per Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.1.4 Life Cycle Management of Contained Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.1.5 Container and Contained Components in Service Units and Service Groups . . . . . . . . . . . 189 6.1.6 Redundancy Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.1.7 Administrative Operations and Container and Contained Components . . . . . . . . . . . . . . . . 191 6.1.8 Failure Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
10
6.3 Failure Handling for Container and Contained Components . . . . . . . . . . . . . . . . . . . . . . 194 6.4 Proxied and Contained Components: Similarities and Differences . . . . . . . . . . . . . . . . . 195 7 Availability Management Framework API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 7.1 Availability Management Framework Model for the APIs . . . . . . . . . . . . . . . . . . . . . . . 198
7.1.1 Callback Semantics and Component Registration and Unregistration . . . . . . . . . . . . . . . . . 198 7.1.2 Component Healthcheck Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.1.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2.2 Variants of Healthchecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2.3 Starting and Stopping Healthchecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2.4 Healthcheck Configuration Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2.4.1 Role of Period and Maximum-Duration in Framework-Invoked Healthchecks . . . 7.1.2.4.2 Role of Period in Component-Invoked Healthchecks . . . . . . . . . . . . . . . . . . . . . . . 7.1.2.4.3 Modification of Healthcheck Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 200 200 201 202 203 204
15
20
7.1.3 Component Service Instance Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 7.1.4 Component Life Cycle Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.1.5 Protection Group Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.1.6 Error Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.1.7 Component Response to Framework Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.1.8 API Usage Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
25
30
7.3 Include File and Library Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.4 Type Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.4.1 SaAmfHandleT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.4.2 Component Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.4.2.1 SaAmfPmErrorsT Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.4.2.2 SaAmfPmStopQualifierT Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
7.4.4.2 Readiness State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4.3 Presence State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4.4 Operational State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4.5 Administrative State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4.6 Assignment State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4.7 Proxy Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4.8 All Defined States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5.1 SaAmfCSIFlagsT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5.2 SaAmfCSITransitionDescriptorT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5.3 SaAmfCSIStateDescriptorT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5.4 SaAmfCSIAttributeListT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5.5 SaAmfCSIDescriptorT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.6.1 SaAmfProtectionGroupMemberT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.6.2 SaAmfProtectionGroupChangesT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.6.3 SaAmfProtectionGroupNotificationT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.6.4 SaAmfProtectionGroupNotificationBufferT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
213 213 214 214 214 214 215 215 216 217 218 219 220 220 221 221
10
15
7.4.7 SaAmfRecommendedRecoveryT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 7.4.8 saAmfCompCategoryT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 7.4.9 saAmfRedundancyModelT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 7.4.10 saAmfCompCapabilityModelT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 7.4.11 Notification Related Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 7.4.12 SaAmfCallbacksT_3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
20
25
30
35
40
10
SAI-AIS-AMF-B.03.01
AIS Specification
10
15
8 AMF UML Information Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 8.1 Use of Entity Types in the AMF UML Information Model . . . . . . . . . . . . . . . . . . . . . . . 290 8.2 Notes on the Used Conventions in UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 8.3 DN Formats for Availability Management Framework UML Classes . . . . . . . . . . . . . . . 291 8.4 AMF Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 8.5 Availability Management Framework Instances and Types View . . . . . . . . . . . . . . . . . . 294 8.6 Availability Management Framework Instances View . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 8.7 AMF Cluster, Node, and Node-Related Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 8.8 Application Classes Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 8.9 Service Group Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 8.10 Service Unit Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 8.11 Service Instance Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 8.12 Component Service Instance Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 8.13 Component and Component Types Class Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
8.13.1 Component Type Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 8.13.2 Component Classes Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
20
25
30
8.14 AMF Global Component Attributes and Healthcheck Classes . . . . . . . . . . . . . . . . . . . 314 9 Administration API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 9.1 Availability Management Framework Administration API Model . . . . . . . . . . . . . . . . . 317 9.2 Include File and Library Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 9.3 Type Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
9.3.1 saAmfAdminOperationIdT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
11
9.4.3 SA_AMF_ADMIN_LOCK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 9.4.4 SA_AMF_ADMIN_LOCK_INSTANTIATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 9.4.5 SA_AMF_ADMIN_UNLOCK_INSTANTIATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 9.4.6 SA_AMF_ADMIN_SHUTDOWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 9.4.7 SA_AMF_ADMIN_RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 9.4.8 SA_AMF_ADMIN_SI_SWAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 9.4.9 SA_AMF_ADMIN_SG_ADJUST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 9.4.10 SA_AMF_ADMIN_REPAIRED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 9.4.11 SA_AMF_ADMIN_EAM_START . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 9.4.12 SA_AMF_ADMIN_EAM_STOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
9.5 Summary of Administrative Operation Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 10 Basic Operational Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 10.1 Administrative Shutdown of a Service Instance in a 2N case . . . . . . . . . . . . . . . . . . . . 345 10.2 Administrative Shutdown of a Service Unit in a 2N Case . . . . . . . . . . . . . . . . . . . . . . . 347 10.3 Administrative Shutdown of a Service Unit for the N-Way Model . . . . . . . . . . . . . . . . 348 10.4 Administrative Lock of a Service Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 10.5 Administrative Lock of a Service Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 10.6 A Simple Fail-Over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 10.7 Administrative Shutdown of an SI Having a Container CSI . . . . . . . . . . . . . . . . . . . . . 353 10.8 Administrative Lock of an SI Having a Container CSI . . . . . . . . . . . . . . . . . . . . . . . . . 356 10.9 Administrative Lock of a Service Unit with a Container Component . . . . . . . . . . . . . . 358 10.10 Restart of a Container Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 11 Alarms and Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 11.1 Setting Common Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 11.2 Availability Management Framework Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
11.2.1 Availability Management Framework Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
11.2.1.1 Availability Management Framework Service Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1.2 Component Instantiation Failed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1.3 Component Cleanup Failed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1.4 Cluster Reset Triggered by a Component Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1.5 Service Instance Unassigned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1.6 Proxy Status of a Component Changed to Unproxied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2.1 Administrative State Change Notify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2.2 Operational State Change Notify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2.3 Presence State Change Notify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2.4 HA State Change Notify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2.5 SI Assignment State Change Notify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2.6 Proxy Status of a Component Changed to Proxied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 369 371 373 374 376 377 378 379 380 381 382
10
15
20
25
30
35
40 Appendix A Implementation of CLC Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Appendix B API Functions in Unregistered Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
12
SAI-AIS-AMF-B.03.01
AIS Specification
10
15
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
13
10
15
20
25
30
35
40
14
SAI-AIS-AMF-B.03.01
AIS Specification
1 Document Introduction
1.1 Document Purpose
This document defines the Availability Management Framework of the Application Interface Specification (AIS) of the Service AvailabilityTM Forum (SA Forum). It is intended for use by implementers of the Application Interface Specification and by application developers who would use the Application Interface Specification to develop applications that must be highly available. The AIS is defined in the C programming language, and requires substantial knowledge of the C programming language. Typically, the Service AvailabilityTM Forum Application Interface Specification will be used in conjunction with the Service AvailabilityTM Forum Hardware Interface Specification (HPI).
10
15
1.3 History
Previous releases of the Availability Management Framework specification: (1) SAI-AIS-AMF-A.01.01 (2) SAI-AIS-AMF-B.01.01 (3) SAI-AIS-AMF-B.02.01 This section presents the changes of the current release, SAI-AIS-AMF-B.03.01, with respect to the SAI-AIS-AMF-B.02.01 release. Editorial changes that do not change semantics or syntax of the described interfaces are not mentioned. 1.3.1 New Topics
x
25
30
35
For the purposes of software management (see [6]) and to facilitate the Availability Management Framework configuration, types have been defined for the various entities of the Availability Management Framework, namely component (Section 3.2.2.5), service unit (Section 3.2.4.1), service instance (Section 3.2.5.1), service group (Section 3.2.6.1), and application (Section 3.2.7.1). Additionally the healthcheck type has been defined in Section 7.1.2.4.
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 1
15
This release contains the support of container and contained components. This support induced a series of changes and additions. The main changes affected the following sections, chapters, and tables: Section 3.2, Section 3.2.2, Section 3.2.2.1, Section 3.2.2.1.1, Section 3.2.2.3, Table 3, Section 3.2.3, Section 3.2.4, Section 3.2.9, Section 3.3.1.4, Section 3.3.2.1, Section 3.5.1, Section 3.7.1.2, Section 3.12.1.3.1, Section 3.12.2.1, Section 3.12.2.2, Section 4.6, Section 4.8, Section 7.4.7, Section 7.4.8, Section 7.6.1, Section 7.6.3, Section 9.4.3, Section 9.4.6, Section 9.4.7, Appendix A, and Appendix B. Note that this list is not exhaustive.
A new Chapter 6 summarizes the properties of container and contained com-
10
ponents.
Two new callback functions,
SaAmfContainedComponentInstantiateCallbackT (in Section 7.10.4) and SaAmfContainedComponentCleanupCallbackT (in Section 7.10.5), have also been introduced. As a consequence, the SaAmfCallbacksT_3 structure (see Section 7.4.12) superseded the SaAmfCallbacksT structure, and the saAmfInitialize_3() function (see Section 7.5.1) superseded the saAmfInitialize() function.
As an illustration, operational scenarios for container and contained compo-
15
20
nents are presented in Chapter 10, starting with Section 10.7. Furthermore, the Availability Management Framework Information Model was also extended (new saAmfCompContainerCsi attribute in the SaAmfComp object class, see Section 8.13.2).
x
25
The node group optional configuration attribute has been introduced for service units and service groups. For details, refer to Section 3.7.1.2. The new Section 3.12.1.4.2 introduces a mechanism to disable the auto-repair behavior of the Availability Management Framework during an upgrade campaign (see [6]). This topic led to modifications in Section 7.4.11 (new value in the SaAmfAdditionalInfoIdT enum), Section 8.10 (definition of the new saAmfSUMaintenanceCampaign attribute in the SaAmfSU object class), and in Section 11.2.2.2 (values for the additional information field). Section 7.1.2.4.3 clarifies when changes of healthcheck configuration attributes become effective. Section 7.2 describes the behavior of the Availability Management Framework API on a cluster node that is not in the cluster membership (see [3]). As a consequence, the SA_AIS_ERR_UNAVAILABLE return value has been added to various API functions (see Section 1.3.4). 30
35
40
16
AIS Specification
Section 7.7.1 presents the saAmfPmStart_3() function as a replacement for saAmfPmStart(). This replacement was necessary because the type of the processId parameter has changed. Chapter 8 presents the Availability Management Framework UML Information model. UML diagrams and DNs for the Availability Management Framework, which were contained in the Overview document version B.03.01, have been removed from the Overview document version B.04.01 ([1]). These UML diagrams and DNs have been updated and are now presented in Chapter 8. In the various places of this specification where the Availability Management Framework configuration is explained, the description provides the name of the pertinent configuration attributes, the UML diagrams that contain these attributes, and the sections in Chapter 8 containing these diagrams.
10
1.3.2 Clarifications
x
15
Chapter 3 clarifies in all usages of a configuration attribute as a rank that the rank is represented by a positive integer, and the lower the integer value, the higher the rank. Such a clarification was already present in the Availability Management Framework B.02.01 specification for the multiple (ranked) standby assignments definition in Section 3.7.1.1. This clarification has been added to the definitions ordered list of service units for a service group and ordered list of SIs in Section 3.7.1.1, to the configuration attribute ranked service unit list per SI in Section 3.7.4.3, and to the configuration attribute ranked service unit list per SI in Section 3.7.5.3. Section 3.2.1 and subsections clarify the definitions of Availability Management Framework node and of Availability Management Framework cluster and explain their relationships to the Cluster Membership notions of node and cluster. Section 3.2.1.2 clarifies actions taken by the Availability Management Framework when a node leaves the cluster membership. Section 3.2.6 clarifies that any service unit of a service group must be able to take an assignment for any of the service instances protected by the service group. This clarification led to a change on page 121 in Section 3.7.4.1 on the Nway redundancy model and to the note on page 146 in Section 3.7.6.1 on the no-redundancy redundancy model. Section 3.3.1.4 clarifies how the readiness state of a non-pre-instantiable service unit is affected by the Cluster Membership node containing the service unit being or not a member node. Section 3.3.2.1 defines the instantiating, instantiated, and terminating presence state of a component.
20
25
30
35
40
AIS Specification
17
Section 3.12.1.4 clarifies that the Availability Management Framework sets the operational state of the service unit to disabled if the instantiation of the erroneous component as a repair action fails. The first paragraph of Section 4.3 clarifies how components can access the name/value pairs for each component service instance. Section 7.4.8 shows in Table 19 all possible combinations of the values of the saAmfCompCategoryT typedef structure. Section 7.5.3 on the saAmfDispatch() function clarifies the meaning of the SA_AIS_OK return value. The description of the saAmfFinalize() function (see Section 7.5.4) clarifies that this function frees all resources allocated by the Availability Management Framework for the process in the particular association between the process and the Availability Management Framework. The description of the SaAmfCSISetCallbackT function in Section 7.9.2 clarifies how the Availability Management Framework behaves if a process that has been requested to assume the quiescing HA state in an invocation of this callback does not invoke the saAmfQuiescingComplete() function with the error parameter set to SA_AIS_OK within a configured time interval. The descriptions of the administrative operations SA_AMF_ADMIN_UNLOCK, SA_AMF_ADMIN_LOCK, SA_AMF_ADMIN_UNLOCK_INSTANTIATION, SA_AMF_ADMIN_LOCK_INSTANTIATION, and SA_AMF_ADMIN_REPAIRED in subsections of Section 9.4 clarify under which conditions these operations are applicable. Section 11.2.1.2 on the Component Instantiation Failed notification clarifies that this notification is not only issued for failures related to the INSTANTIATE command, but also for failures related to the callbacks to instantiate a component.
10
15
20
25
1.3.3 Superseded and Superseding Functions The Availability Management Framework defines for the version B.03.01 new functions and new type definitions to replace functions and type definitions of the version B.02.01. The list of replaced functions and type definitions in alphabetic order is presented in Table 1. The superseded functions and type definitions are no longer supported in version B.03.01, and no description is provided for them in this document. The names of the superseding functions and type definitions are obtained by adding _3 to the respective names of the previous version. Regarding the support of backward compatibility in SA Forum AIS, refer to the Overview document ([1]).
30
35
40
18
AIS Specification
1 Table 1 Superseded Functions and Type Definitions in Version B.03.01 Functions and Type Definitions of B.02.01 no Longer Supported in B.03.01 SaAmfCallbacksT() saAmfInitialize() saAmfPmStart() 1.3.4 Changes in Return Values of API Functions The first row in the following table applies to all functions of this release. The other rows apply only to functions that have not been superseded. 15 10 5
Return Value
SA_AIS_ERR_UNAVAILABLE
Change Type
new clarified
20
SA_AIS_OK
25
1. The SaAmfProtectionGroupTrackCallbackT callback function has the SA_AIS_ERR_UNAVAILABLE return value in the error parameter.
30
In Section 3.5, the two figures showing state transitions and the two tables showing combined states for pre-instantiable and non-pre-instantiable service units have changed. Additional clarifications have been added to this section and to its subsections. The definition of component capability has changed. It applies now to a component on behalf of a component service type (see Section 3.6). In the example of a service group configuration for the N-way redundancy model in Section 3.7.4.4 (on page 125), the maximum number of active SIs per service unit was corrected from 4 to 3. The last bullet of Section 3.7.6.4 (on page 149) on the no-redundancy model was corrected to state that the assignments of SIs to service units are based on the ranking of the SIs in the ordered list of SIs. 35
40
AIS Specification
19
Section 4.2 on the configuration of the path name for CLC-CLI commands has changed, as now the pathname prefix is AMF node-specific. As part of this topic, the SaAmfNodeSwBundle association class was introduced (see Section 8.7 and FIGURE 27), and the saAmfCompCmdPathPrefix attribute was removed from the SaAmfComp object class in FIGURE 36. Section 4.7 was corrected to state that a proxied, pre-instantiable (respectively proxied, non-pre-instantiable) component is terminated by invoking the saAmfComponentTerminateCallback() (respectively saAmfCSIRemoveCallback()) callback function on its proxy component. In Section 7.4.9, two typos have been corrected, one in the name of the enum saAmfRedundancyModelT and the other in the name of its member SA_AMF_N_WAY_ACTIVE_REDUNDANCY_MODEL. If an application responds to a callback by invoking the saAmfResponse() function (see Section 7.13.1) with the error parameter set to a value different from the error codes defined in the description of these callbacks, the Availability Management Framework handles these error codes, as if the application had returned SA_AIS_ERR_FAILED_OPERATION. The affected callback functions are SaAmfHealthcheckCallbackT (Section 7.8.2), SaAmfCSISetCallbackT (Section 7.9.2), SaAmfCSIRemoveCallbackT (Section 7.9.3), SaAmfComponentTerminateCallbackT (Section 7.10.1), SaAmfProxiedComponentInstantiateCallbackT (Section 7.10.2), SaAmfProxiedComponentCleanupCallbackT (Section 7.10.3), SaAmfContainedComponentInstantiateCallbackT (Section 7.10.4), and SaAmfContainedComponentCleanupCallbackT (Section 7.10.5). This special treatment of the error parameter applies also to the saAmfCSIQuiescingComplete() function (see Section 7.9.4). The sentence If the implementation supports the required releaseCode, and a major version >= the required majorVersion, SA_AIS_OK is returned. in the superseded saAmfInitialize() function has been replaced by the sentence If the implementation supports the specified releaseCode and majorVersion, SA_AIS_OK is returned. in the description of the saAmfInitialize_3() function (see Section 7.5.1). The description of the csiDescriptor parameter of the SaAmfCSISetCallbackT function in Section 7.9.2 was corrected to indicate that the csiDescriptor is a structure and not a pointer to a structure. The names and descriptions of some configuration attributes shown in Chapter 8 have changed.
10
15
20
25
30
35
40
20
AIS Specification
The component-invoked or the Framework-invoked healthchecks are no longer referred to in this document as being types of healthchecks. Instead, the term variant is used to refer to either component-invoked or Framework-invoked healthchecks. This change was necessary to distinguish these variants of healthchecks from the healthcheck type, which is used to configure healthchecks for a component type (see Section 8.14). In Section 10.4, the transition to set Comp1 to the quiesced HA state for CSI1 was removed, as CSI1 will subsequently also be removed from Comp2.
1.4 References
The following documents contain information that is relevant to specification: [1] Service AvailabilityTM Forum, Service Availability Interface, Overview, SAI-Overview-B.04.01 [2] Service AvailabilityTM Forum, Application Interface Specification, Notification Service, SAI-AIS-NTF-A.02.01 [3] Service AvailabilityTM Forum, Application Interface Specification, Cluster Membership Service, SAI-AIS-CLM-B.03.01 [4] Service AvailabilityTM Forum, Application Interface Specification, Information Model Management Service, SAI-AIS-IMM-A.02.01 [5] Service AvailabilityTM Forum, Information Model in XML Metadata Interchange (XMI) v2.1 format, SAI-XMI-A.03.01 [6] Service AvailabilityTM Forum, Application Interface Specification, Software Management Service, SAI-AIS-SMF-A.01.01 [7] CCITT Recommendation X.731 | ISO/IEC 10164-2, State Management Function [8] CCITT Recommendation X.733 | ISO/IEC 10164-4, Alarm Reporting Function [9] IETF RFC 2253 (http://www.ietf.org/rfc/rfc2253.txt). [10] IETF RFC 2045 (http://www.ietf.org/rfc/rfc2045.txt). References to these documents are made by putting the number of the document in brackets.
10
15
20
25
30
35
40
AIS Specification
21
10
15
20
35
40
22
AIS Specification
2 Overview
This specification defines the Availability Management Framework within the Application Interface Specification (AIS).
10
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 2
23
10
15
20
25
30
35
40
24
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
10
Physical entities that form the basis of the model and the relationships among them (Section 3.1) Logical entities managed by the Availability Management Framework (Section 3.2) States and state models applicable to the relevant logical entities (Section 3.3 and Section 3.5) Fail-over and switch-over of service instances (Section 3.4) Component capability model (Section 3.6) Redundancy models supported by the Availability Management Framework (Section 3.7) Interactions between the component capability model and the redundancy models (Section 3.8) Dependencies among different entities (Section 3.9)
15
20
x x x
25
x x
Approaches for integrating legacy software and hardware entities in the framework (Section 3.10) x Component monitoring (Section 3.11) x Error detection, recovery, repair, and escalation policy (Section 3.12) Note: The description of the Availability Management Framework configuration provides the pertinent attribute names, the names of object classes containing these attributes, and the sections containing the respective UML diagrams. Additional details on type, multiplicity, and values of these attributes are given in Chapter 8 and [5]. It is recommended to first read the entire Chapter 3 to fully understand the Availability Management Framework configuration described in these references.
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 3
25
Service AvailabilityTM Application Interface Specification System Description and System Model
Most entities of the Availability Management Specification have types, which are used to facilitate the configuration and for software management purposes. These types are shortly described in a subsection of the sections describing the corresponding entities. Additional details are provided in Chapter 8 and [5].
15
25
30
35
40
26
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
FIGURE 1 shows a UML diagram that depicts the physical entities of the system.
FIGURE 1 Physical Entities
5
Resource
10
Physical Node
15
20
25
30
35
40
AIS Specification
27
Service AvailabilityTM Application Interface Specification System Description and System Model
CLM Cluster 1
AMF Cluster 1
10
15
1..* CLM Node 1 Maps on 0..1 1..* AMF Node 1
20
Hosted on
25
1..*
ComponentService Instance
30
1..* Local Component 1..* External Component 0..* SA-aware Component Contained Component 0..* {incomplete, overlapping} Contains 1 Proxy Component 0..1 Proxies 1 ProxiesExternal Container Component
35
40
28
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Each logical entity of the system model is identified by a unique name. All logical entities, their attributes, relationships, and mapping to the resources they represent are typically preconfigured and stored in a configuration repository. Dynamic modification of the system model is not precluded. The modeling and organization of this configuration information is described in Chapter 8. The access and modification of this configuration repository is provided by the Object Management interface of the IMM Service ([4]) and accompanying SNMP MIBs. It is assumed that the Availability Management Framework obtains the cluster configuration from the configuration repository and is notified of any changes. 3.2.1 Cluster and Nodes
Note:
10
In the remainder of this document, the terms CLM node, CLM cluster, AMF node, and AMF cluster will be used as synonyms to Cluster Membership node and cluster and Availability Management Framework node and cluster respectively.
15
The AMF node is a logical entity that represents a complete inventory of all Availability Management Framework entities on a CLM node (which is defined in [3]). The AMF node is administratively configured to be hosted on a specified CLM node (representing the physical node within the cluster). However, the Availability Management Framework can maintain the availability of software entities on a set of AMF nodes only if each of these nodes maps to a CLM node which is currently a member node. The configuration of an AMF node is valid even if (a) no CLM node is mapped to the AMF node, or (b) a CLM node is mapped to the AMF node, but the mapped CLM node is not in the cluster membership. In both the aforementioned cases, the AMF node cannot be used to provide service, and none of the Availability Management Framework objects configured to be hosted by the AMF node can be instantiated. An AMF node is also a logical entity whose various states are managed by the Availability Management Framework. Availability Management Framework administrative operations are defined for such nodes. For a complete list of the attributes that are configured for an AMF node, refer to the description of the SaAmfNode UML class in Section 8.7 on page 296.
20
25
30
35
40
AIS Specification
29
Service AvailabilityTM Application Interface Specification System Description and System Model
The complete set of AMF nodes in the Availability Management Framework configuration defines the AMF cluster. For the relationship between the AMF cluster, the SA Forum cluster, and the Cluster Membership objects SaClmCluster and SaClmNode, refer to the corresponding UML overview diagram in [1]. Note that though the AMF cluster and the CLM cluster (defined in [3]) have a close relationship, they are not the same:
x
During cluster startup, it is possible that some AMF nodes may be mapped to some CLM nodes by configuration (see the saAmfNodeClmNode attribute of the SaAmfNode object class, shown in Section 8.7), whereas other AMF nodes are not mapped to configured CLM nodes, and thus do not provide service. Later, during the life-span of the cluster, modifications may be made to the mapping of the AMF node to the CLM node. There may be nodes in the CLM cluster that are not meant to run software controlled by the Availability Management Framework. Thus, in a fully configured system, the CLM cluster may contain more nodes than the AMF cluster; in this case, the AMF cluster will be a subset of the CLM cluster.
10
15
20 The administrator is responsible for specifying the configuration for mapping AMF nodes to CLM nodes. The AMF cluster is one of the entities that are under the Availability Management Frameworks control, and its administrative state is managed by the Availability Management Framework (see Section 3.3.8). The Availability Management Framework defines certain administrative operations for the AMF cluster. The Availability Management Framework knows the association of its nodes to the CLM nodes and shall use this association to initiate operations such as rebooting a CLM node during recovery operations. If an AMF node leaves the cluster membership, it is cleaned in the sense that no process belonging to components is left over (see also Section 7.2.1). Only persistent Availability Management Framework information will be available again when the AMF node rejoins the cluster membership. The Availability Management Framework can force a cluster node to reboot while engaging certain recovery and repair mechanisms. During the reboot, the CLM node leaves the cluster membership and rejoins it after successful initialization. It is required that the underlying CLM node of each AMF node is equipped with an operating system and a low level reboot interface. 25
30
35
40
30
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
In contrast, the restart of an AMF node (see also Section 9.4.7) will only stop and start entities under the Availability Management Frameworks control, without any impact on the cluster membership. The restart of the AMF cluster (see also Section 9.4.7) will restart all AMF nodes and will also not affect the cluster membership. On the other hand, a cluster reset (see Section 7.4.7) reboots all CLM nodes of the cluster, whereby all CLM nodes are first halted before any of them boots again. In the remainder of this specification, cluster start or startup is synonymous to the start of Availability Management Framework, which initially creates and instantiates the Availability Management Framework objects based on the Availability Management Framework configuration. Applications to be made highly available are supposed to be configured in the Availability Management Framework configuration. Each application is configured to be hosted in one or more AMF nodes within the AMF cluster.
3.2.1.3 Usage of the Terms Node and Cluster in this Document
10
15
To make the specification more readable and precise, the following notations are used:
x
20
Throughout the specification, when the word "node" is used without an explicit qualification, it means "AMF node". If "node is used in the context of "reboot", "joining the cluster", and "leaving the cluster", it actually means "the associated CLM node". For example, the sentence "The node will be rebooted." should be read as "The CLM node associated to the node will be rebooted.". Similarly, the sentence fragment when a node joins the cluster should be interpreted as when the CLM node associated with the node joins the cluster.
25
Whenever "cluster" is used without an explicit qualification, it assumes either of the following two meanings based on the context:
x
30
Cluster as defined in the Cluster Membership Service; for example, the sentence The node leaves the cluster.", implies "The node leaves the cluster as defined in the Cluster Membership Service.". A generic term that describes a set of nodes on which a set of highly available applications are deployed.
35
40
AIS Specification
31
Service AvailabilityTM Application Interface Specification System Description and System Model
3.2.2 Components A component is the logical entity that represents a set of resources to the Availability Management Framework. The resources represented by the component encapsulate specific application functionality. This set can include hardware resources, software resources, or a combination of the two. A component is the smallest logical entity on which the Availability Management Framework performs error detection and isolation, recovery, and repair. When deciding what is to be included in a component, the following two rules should be taken into account:
x
10
The scope of a component must be small enough, so that a failure of the component has as little impact as possible on the services provided by the cluster. The component should include all functions that cannot be clearly separated for error containment or isolation purposes. 15
The Availability Management Framework associates the following states to a component: presence, operational, readiness, and HA. For more information on component states, refer to Section 3.3.2. The Availability Management Framework was primarily designed to manage local resources contained in nodes. This framework can also manage resources external to the cluster. Unlike the case of local resources, the Availability Management Framework has little direct control over external resources. This difference justifies the distinction between two broad categories of components:
x
20
25
Local component: a local component represents a subset of the local resources contained within a single node. External component: an external component represents a set of resources that are external to the cluster.
30
Section 3.2.2.1 up to Section 3.2.2.4 describe how the Availability Management Framework manages local and external components. The information provided includes:
x
the notion of component category to distinguish components with different properties and different behavior. Two main categories of components are defined: Service Availability (SA)-aware (see Section 3.2.2.1) and non-SA-aware components (see Section 3.2.2.2); the notion of container and contained SA-aware components (see Section 3.2.2.1.1); the concepts of proxy and proxied components (see Section 3.2.2.3);
35
40
32
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
High levels of service availability can only be attained if errors are detected and isolated, a recovery is performed, and failed entities repaired efficiently. Faster error recovery is possible if components have been chosen or are written, so that they can register and interact with the Availability Management Framework to implement specific workload assignments and recovery policies. Such components must be designed, so that the Availability Management Framework can dynamically assign them workloads and choose the role in which the component will operate for each specific workload. Only local components that are under the direct control of the Availability Management Framework can have such a high level of integration with this framework. Such components are termed SA-aware components. Each SA-aware component includes at least one process that is linked to the Availability Management Framework library. One of these processes registers the component with the Availability Management Framework by invoking the saAmfComponentRegister() API function. This process, called the registered process, provides to the Availability Management Framework references to the availability control functions it implements. These control functions are implemented as callbacks. Throughout the life of the component, the Availability Management Framework uses these control functions to direct the component execution by, for example:
x x x
10
15
20
25
assigning workloads to the component, removing workloads from the component, and assigning the HA state to the component for each workload.
30
The registered process executes the availability management requests it receives from these control functions and conveys such requests to other processes and the hardware equipment of the local component, where necessary. Most control functions of the component can only be provided by the registered process; however, some control functions, such as healthcheck control functions, can be provided by any process of the component. The descriptions of each API function given in Chapter 7 explicitly mention when the function is restricted to a registered process. Additionally, Appendix B contains a table showing which API or callback is restricted to registered processes only.
35
40
AIS Specification
33
Service AvailabilityTM Application Interface Specification System Description and System Model
its life cycle is directly controlled by the Availability Management Framework; each of its processes must exclusively belong to the component. 5
Note that container and contained components (which are discussed in Section 3.2.2.1.1) do not share all of the preceding properties. Occasionally in the remainder of this specification, the term "regular SA-aware component is used to refer to an SA-aware component that is neither a container nor a contained component, when the context does not make the distinction apparent. Note that legacy software running on a node that was not initially designed as an SAaware component can be converted to be SA-aware by adding a new process. This process acts as the registered process for the component, receives all management requests from the Availability Management Framework and converts them into specific actions on the legacy software using existing administration interfaces specific to the legacy software.
3.2.2.1.1 Container and Contained Components
10
15
20
This section describes the particular properties of container and contained components. As other features of container and contained components are described in other sections of this and other chapters, Chapter 6 summarizes the corresponding information and also provides additional information. Purpose The concept of container and contained components allows the Availability Management Framework to integrate components that are not executed directly by the operating system, but rather in a controlled environment running on top of the operating system. Widespread environments are runtime environments, virtual machines, or component frameworks. Properties of Container Components
x
25
30
The main task of the container component is to cooperate with the Availability Management Framework to handle the life cycle of contained components. The following definitions are used in this explanation and throughout this document:
The container component that handles the life cycle of a contained component
35
in cooperation with the Availability Management Framework is termed the associated container component to the contained component.
40
34
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
component in cooperation with the Availability Management Framework is termed associated contained component to the container component.
For ease of expression when referring to a contained component, the term
collocated contained component is used to refer to a contained component that has the same associated container component. Which actions are performed by the associated container component and the Availability Management Framework for handling the life cycle of a contained component is explained in detail in Section 6.2. The interactions between contained components and the associated container to implement the life cycle of the contained components are not defined by the Availability Management Framework specification.
x
10
A single container component can be the associated container component of various contained components. A container component and all its associated contained components must reside on the same AMF node. The life cycle of a container component is directly controlled by the Availability Management Framework. The termination of a container component (for instance, in case of a failure of the component) implies the termination of all associated contained components (see also Section 6.3). In this sense, a container component contains the associated contained component. A process belonging to a container component can also belong to its associated contained components.
15
20
25
The life cycle of contained components is handled by the Availability Management Framework in cooperation with the associated container component (see also Section 6.2). The termination of a contained component does not imply the termination of either the associated container component or the collocated contained components (see also Section 6.3). A process belonging to a contained component can also belong to its collocated contained components and to the associated container component.
30
35
40
AIS Specification
35
Service AvailabilityTM Application Interface Specification System Description and System Model
Components that do not register directly with the Availability Management Framework are called non-SA-aware components. However, such components may have processes linked with the Availability Management Framework Library. Typically, non-SA-aware components are registered with the Availability Management Framework by dedicated SA-aware components that act as proxies between the Availability Management Framework and the non-SA-aware components. These dedicated SA-aware components are called proxy components. The components for which a proxy component mediates are called proxied components. Proxy and proxied components are explained in more detail in Section 3.2.2.3.
3.2.2.2.1 External Components
10
To keep maximum flexibility in the way external resources interact with nodes, which is often device-dependent or proprietary, the Availability Management Framework does not interact directly with external components and manages external components always as proxied components.
3.2.2.2.2 Non-Proxied, Non-SA-Aware Components
15
20
The Availability Management Framework supports both proxied and non-proxied, non-SA-aware local components. For non-proxied, non-SA-aware local components, the role of the Availability Management Framework is limited to the management of the component life cycle. The Availability Management Framework instantiates a non-proxied, non-SA-aware component when the component needs to provide a service and terminates this component when it must stop providing the service. Processes of a local non-SA-Aware component must exclusively belong to that component.
3.2.2.2.3 Integration and Usage of Non-SA-Aware Local Components
25
30
Application developers are encouraged to design applications that will run on nodes as a set of SA-aware components registered directly with the Availability Management Framework; however, non-SA-aware local components may be used instead for the following reasons:
x
35
Some system resources such as networking resources or storage resources are implemented by the operating environment, and their activation or deactivation is usually performed by running administrative command line interfaces. No actual process is needed to implement these resources, and requiring the implementation of a registering process for such resources adds unnecessary complexity.
40
36
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
For components representing only local hardware resources, making these components SA-aware components with a registering process adds unnecessary complexity. Existing clustering products support looser execution models than the execution model of SA-aware components. For these products, the integration between the applications and the clustering middleware is minimal: the clustering middleware is only responsible for starting, stopping, and monitoring applications, but does not expose APIs for finer-grained control of the application in terms of workload and availability management. It is important to facilitate the migration of third party products from these existing clustering products to products providing the Availability Management Framework interfaces without requiring the transformation of these third party products into SA-aware components. Some complex applications such as databases or application servers already provide their own availability management for their various building blocks. When moving these applications under the Availability Management Frameworks control, different functions can be modeled as separate components; however, some controlling entity within the application might still be interposed between the Availability Management Framework and the individual components. The concept of the proxy component can be used in this case as an interposition layer between the Availability Management Framework and all other components of the application.
10
15
20
25
The Availability Management Framework uses the availability control functions registered by a proxy component to control the proxy component and the proxied components for which the proxy component mediates. The proxy component is an SA-aware component that is responsible for conveying requests made by the Availability Management Framework to its proxied components. A contained component must not be a proxy component. The interactions between proxied components and their proxy component are private and not defined by this specification. The Availability Management Framework determines the proxied components for which a proxy component is responsible when the proxy component registers with the framework, based on configuration and other factors like availability of components in the cluster. The Availability Management Framework conveys this decision to the proxy component by assigning it a workload in the form of a component service instance (for the definition of component service instance, see Section 3.2.3). 30
35
40
AIS Specification
37
Service AvailabilityTM Application Interface Specification System Description and System Model
The proxy component registers proxied components with the Availability Management Framework; however, the proxied components are independent components as far as the Availability Management Framework is concerned. As such, if a proxy component fails, or an entity containing it is prevented by the administrator from providing service, another component (usually the component acting as standby to the failed proxy component) can register the proxied component again. This new proxy component assumes then the mediation for the failed component without affecting the service provided by the proxied component. If no proxy component is available to take over the mediation service, the Availability Management Framework loses control of these proxied components and becomes unaware of whether the proxied components are providing service. As various other features of proxy and proxied components are described in various sections of this and other chapters, Chapter 5 summarizes the information on all these features and also provides additional information. However, for convenience of the reader, the following notes list some key features:
x
10
15
A single proxy component can mediate between the Availability Management Framework and multiple proxied components. The redundancy model (for a discussion of this notion, refer to Section 3.7) of the proxy component can be different from that of its proxied components. The Availability Management Framework does not consider the failure of the proxied component to be the failure of the proxy component. Similarly, the failure of the proxy component does not indicate a failure of the proxied components (see Section 5.3). 20
25
The Availability Management Framework directly controls the life cycle of non-proxied, local components through a set of command line interfaces provided by each component. The Availability Management Framework indirectly controls the life cycle of proxied components through their proxies. However, command line interfaces may also be used by the Availability Management Framework to control some aspects of the life cycle of local proxied components. For information about command line interfaces for the local component life cycle management, refer to Chapter 4.
30
35
40
38
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The Availability Management Framework distinguishes between two categories of components in its life cycle management:
x
pre-instantiable components: such components have the ability to stay idle when they get instantiated by the Availability Management Framework. They start to provide a particular service only when instructed to do so (directly or indirectly) by the Availability Management Framework. The Availability Management Framework can speed up recovery and repair actions by keeping a certain number of pre-instantiated components, which can then take over faster the work of failed components. All SA-aware components are pre-instantiable components. non-pre-instantiable components: such components provide service as soon as they are instantiated. Hence, the Availability Management Framework cannot instantiate them in advance as spare entities. All non-proxied, non-SA-aware components are non-pre-instantiable components.
10
15 The following table shows the various component categories and subcategories. Table 3 Component Categories Locality local local local local external HA Awareness SA-aware, excluding contained components contained components non-SA-aware non-SA-aware non-SA-aware Proxy Property proxy or non-proxy non-proxied non-proxied proxied proxied Life Cycle Management pre-instantiable pre-instantiable non-pre-instantiable pre-instantiable or non-pre-instantiable pre-instantiable or non-pre-instantiable 30 25 20
The Availability Management Framework supports the notion of a component type. A component type represents a particular version of the software or hardware implementation which is used to construct components. All components of the same type share the attribute values defined in the component type configuration. Some of the attribute values may be overridden and some of them may be extended in the component configuration. Details on the configuration of a component type and of a component are provided in Section 8.13 on page 310 and in [5].
35
40
AIS Specification
39
Service AvailabilityTM Application Interface Specification System Description and System Model
3.2.3 Component Service Instance A component service instance (CSI) represents the workload that the Availability Management Framework can dynamically assign to a component. High availability (HA) states are assigned to a component on behalf of its component service instances. The Availability Management Framework chooses the HA state of a component for each particular component service instance, as described in Section 3.3.2.4. Each component service instance has a set of attributes (name/value pairs), which characterize the workload assigned to the component. Several attributes with the same name may appear in the set of attributes of a component service instance, thus providing support for multivalued attributes. These attributes are not used by the Availability Management Framework and are just passed to the components. The Availability Management Framework supports the notion of proxy CSI. A proxy CSI represents the special workload of proxying a proxied component. A proxied component must be configured with the proxy CSI that provides "proxying". The Availability Management Framework configuration specifies to which proxy components this proxy CSI is assigned. Note that a proxy component can be configured to have multiple CSI assignments, one or more for handling proxied components and others for providing other services. In terms of functionality, there is no difference between a proxy CSI corresponding to the workload of proxying proxied components and CSI assignments corresponding to the workload of other services. The Availability Management Framework supports the notion of container CSI. A container CSI represents the special workload of managing the life cycle of contained components. A contained component must be configured with the container CSI. The Availability Management Framework determines, based on its configuration, the container components to which this container CSI is assigned. Which of these container components will become the associated container is explained in detail in Section 6.2. The container CSI can contain information to be passed by the associated container component to the corresponding contained component. How this information is passed is a private interface between container and contained components. Note that a container component can be configured to have multiple CSI assignments, one or more for handling contained components, and others for providing other services. In terms of functionality and syntax, there is no difference between a container CSI used to determine the associated container component and other CSIs corresponding to the workload of other services.
10
15
20
25
30
35
40
40
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The Availability Management Framework supports the notion of component service type. The component service type is the generalization of similar component service instances (that is, similar workloads) that are seen by the Availability Management Framework as equivalent and handled in the same manner. The component service type defines the list of the attribute names for all component service instances belonging to the type. Details on the configuration of a component service type and of a component service instance are provided in Section 8.12 on page 308 and in [5]. 3.2.4 Service Unit A service unit (SU) is a logical entity that aggregates a set of components combining their individual functionalities to provide a higher level service. Aggregating components into a logical entity managed by the Availability Management Framework as a single unit provides system administrators with a simplified, coarser-grained view. Most administrative operations apply to service units as opposed to individual components. A service unit can contain any number of components, but a particular component can be configured in only one service unit. The components that constitute a service unit can be developed in isolation, and a component developer might be unaware of which components constitute a service unit. The service units are defined at deployment time. As a component is always enclosed in a service unit, from the Availability Management Framework's perspective, the service unit is the unit of redundancy in the sense that it is the smallest logical entity that can be instantiated in a redundant manner (that is, more than once). The Availability Management Framework associates presence, administrative, operational, readiness, and HA states to service units (latter on behalf of service instances). Each of these states, with the exception of the administrative state, represents an aggregated view of the corresponding state of each component within the service unit. The rules applied to obtain these aggregated states are specific to each state and are described in Section 3.3. Local components and external components cannot be mixed within a service unit. The Availability Management Framework distinguishes between local service units and external service units. Local service units can contain only local components (they are collocated on the same node). External service units can contain only exter-
10
15
20
25
30
35
40
AIS Specification
41
Service AvailabilityTM Application Interface Specification System Description and System Model
nal components. The external components represent resources that are external to the cluster. A proxy component and its non-pre-instantiable proxied component can reside in the same or in different service units; however, a proxy component and its pre-instantiable proxied component must not reside in the same service unit in order to prevent cyclic dependencies during the instantiation of the service unit. If the proxy and proxied local components are hosted in different service units, these service units may reside on different nodes. In a service unit, contained components must not be mixed with components of other categories. The rationale for this decision is explained in Section 6.1.5. All contained components in a service unit must have the same associated container component, and this association is achieved by the usage of a single container CSI (see also Section 6.2). A service unit that contains at least one pre-instantiable component is called a preinstantiable service unit; otherwise, it is called a non-pre-instantiable service unit.
3.2.4.1 Service Unit Type
10
15
20
The Availability Management Framework supports the notion of a service unit type. The service unit type defines a list of component types and, for each component type, the number of components that a service unit of this type may accommodate. A service unit of a given type may only consist of components of the component types from that list, and the number of these components must be within the range specified for the component type. All service units of the same type share the attribute values defined in the service unit type configuration. Some of the attribute values may be overridden in the service unit configuration. All service units of the same type can be assigned service instances derived from the same set of service types. Details on the configuration of a service unit type and of a service unit are provided in Section 8.10 on page 302 and in [5]. 3.2.5 Service Instances In the same way as components are aggregated into service units, the Availability Management Framework supports the aggregation of component service instances into a logical entity called a service instance (SI). A service instance aggregates all component service instances to be assigned to the individual components of the service unit in order for the service unit to provide a particular service.
25
30
35
40
42
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
A service instance can contain multiple component service instances, but a particular component service instance can be configured in only one service instance. A service instance represents a single workload assigned to the entire service unit.
5 When a service unit is available to provide service (in-service readiness state, see Section 3.3.1.4), the Availability Management Framework can assign HA states to the service unit for one or more service instances. When a service unit becomes unavailable to provide service (out-of-service readiness state), the Availability Management Framework removes all service instances from the service unit. A service unit might be available to provide service but not have any assigned service instance. The Availability Management Framework assigns a service instance to a service unit programmatically by assigning each individual component service instance of the service instance to a specific component within the service unit. The assignment of the component service instances of a service instance to the components of a service unit takes into account the type of component service instance supported by each component. A component service instance can be assigned to a given component only if the component configuration indicates that the component supports this particular type of component service instance, and the component configuration permits assignment of at least one more component service instance of this type. When a service instance contains several component service instances of the same type, this specification does not dictate how, within the service unit, the Availability Management Framework assigns them to the components that support this particular type. This choice is implementation-defined. The number of component service instances aggregated in a service instance may differ from the number of components aggregated in the service unit to which the service instance is assigned. In such cases, some components may be left without any component service instance assignment whereas other components may have several component service instances assigned to them.
3.2.5.1 Service Type
10
15
20
25
30
The Availability Management Framework supports the notion of a service type. The service type defines a list of component service types of which a service instance may be composed. The service type also defines for each component service type the number of component service instances that a service instance of the given type may aggregate. All service instances of the same type share the attribute values defined in the service type configuration. Details on the configuration of a service type and of a service instance are provided in Section 8.11 on page 305 and in [5].
35
40
AIS Specification
43
Service AvailabilityTM Application Interface Specification System Description and System Model
3.2.6 Service Groups To ensure service availability in case of component failures, the Availability Management Framework manages redundant components that are contained in service units. A service group (SG) is a logical entity that groups one or more service units in order to provide service availability for a particular set of service instances. Any service unit of the service group must be able to take an assignment for any service instance of this set. Furthermore, to participate in a service group, all components in the service unit must support the capabilities required for the redundancy model defined for the service group. The redundancy model defines how the service units in the service group are used to provide service availability. For details about service group redundancy models, refer to Section 3.7.
Note:
10
For readability purposes and if the context permits, this document uses expressions like the components of a service group to mean the components of service units participating in the service group.
15
The Availability Management Framework supports the notion of a service group type. The service group type is a generalization of similar service groups that follow the same redundancy model, provide similar availability, and are composed of units of the same service unit types. All service unit types defined in the service group type must be capable of supporting a common set of service types. All service groups of the same type share the attribute values defined in the service group type configuration. Some of the attribute values may be overridden in the service group configuration. Details on the configuration of a service group type and of a service group are provided in Section 8.9 on page 300 and in [5]. 3.2.7 Application An application is a logical entity that contains one or more service groups. An application combines the individual functionalities of the constituent service groups to provide a higher level service. This aggregation provides the Availability Management Framework with a further scope for fault isolation and fault recovery.
20
25
30
35
40 From a software administration point of view, this grouping into application reflects the set of service units and their components which are delivered as a consistent set
44
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
of software packages, which results in tighter dependency with respect to their upgrade. An application can contain any number of service groups, but a given service group can be configured in only one application. Dependencies amongst service instances (described in Section 3.9.1 on page 155) are more common amongst service instances belonging to the application than amongst service instances of different applications.
10
3.2.7.1 Application Type
The Availability Management Framework supports the notion of an application type. An application type defines a list of service group types, which implies that an application of the given type must be composed of service groups of types from that list. All applications of the same type share the attribute values defined in the application type configuration. Some of the attribute values may be overridden in the application configuration. Details on the configuration of an application type and of an application are provided in Section 8.8 on page 298 and in [5]. 3.2.8 Protection Groups A protection group for a specific component service instance is the group of components to which the component service instance has been assigned. The name of a protection group is the name of the component service instance that it protects. A protection group is a dynamic entity, which changes when component service instances are assigned to components or removed from components. 3.2.9 Service Unit Instantiation When the Availability Management Framework instantiates a pre-instantiable service unit, it:
x
15
20
25
30
runs the INSTANTIATE command (see Section 4.6) for SA-aware components (excluding contained components), invokes the saAmfContainedComponentInstantiateCallback() callback (see Section 7.10.4) of the associated container component for each contained component of the service unit, invokes the saAmfProxiedComponentInstantiateCallback() callback (see Section 7.10.2) of the proxies of all pre-instantiable proxied components of the service unit,
35
40
AIS Specification
45
Service AvailabilityTM Application Interface Specification System Description and System Model
and performs no action for non-pre-instantiable components. Such components are instantiated during the assignment of service instances to the service unit (see Section 3.3.2.4 on page 61).
When the Availability Management Framework instantiates a non-pre-instantiable service unit, it:
x
invokes the saAmfCSISetCallback() callback (see Section 7.9.2) of the proxies of all proxied components of the service unit, and runs the INSTANTIATE command (see Section 4.6) for all non-proxied components. 10
Note that this processing creates an implicit inter-service unit dependency, as the Availability Management Framework needs to instantiate the service units containing proxy components (and sometimes even assign them an active HA state for a service instance) before the instantiation of service units containing proxied components can be successfully completed. 3.2.10 Illustration of Logical Entities The example in FIGURE 3 shows two service groups, SG1 and SG2. SG1 supports a single service instance (A) and SG2 supports two service instances (B and C). On behalf of service instance A, service unit S1 is assigned the active HA state and service unit S2 the standby HA state. Each of the service units S1 and S2 contains two components. The component service instance A1 is assigned to the components C1 and C3, and the component service instance A2 is assigned to the components C2 and C4. Two protection groups A1 and A2 are created, with protection group A1 containing components C1 and C3 and protection group A2 containing components C2 and C4. Note that the name of the protection group is the same as the name of the component service instance. Thus, protection group A1 contains the components that support component service instance A1. On behalf of service instance B, service unit S3 is assigned the active HA state and service unit S5 the standby HA state. Similarly, on behalf of service instance C, service unit S4 is assigned the active HA state and service unit S5 the standby HA state. Each of these service units contains a single component (C5, C6, C7). Thus, while components C5 and C6 are assigned the active HA state for only single component service instances (B1 and C1 respectively), component C7 is assigned the standby HA state for two component service instances (B1 and C1). Two protection groups (B1 and C1) are created, with protection group B1 containing components C5 and C7 and protection group C1 containing components C6 and C7.
15
20
25
30
35
40
46
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
1
FIGURE 3 Elements of the System Model
Node U
Node W
Node X
Service Unit S1 C1
Service Unit S2 C3
PG A2
10
C4 Service Group SG2 Service Unit S3 C5
PG B1
C2
Service Unit S4 C6
PG C1
Service Unit S5 C7
15
20
25
30
35
40
AIS Specification
47
Service AvailabilityTM Application Interface Specification System Description and System Model
3.3.1 Service Unit States In some cases when describing the properties and states of service units, references are made to properties and states of a node or cluster containing it. For readability reasons, it is not always mentioned that these references, obviously, only apply to local service units and are to be ignored for external service units.
3.3.1.1 Presence State
The presence state is supported at the service unit and component levels and reflects the component life cycle. It takes one of the following values:
x x x x x x x
10
20
First, the presence state of a non-pre-instantiable service unit is considered: Note that the presence state of a service unit is described in this section in terms of the presence state of its constituent components, which is explained in detail in Section 3.3.2.1. When all components are uninstantiated, the service unit is uninstantiated. When the first component moves to instantiating, the service unit also becomes instantiating. A non-pre-instantiable service unit is instantiated if it has successfully been assigned the active HA state on behalf of a service instance (see Section 3.3.1.5). Note that a non-pre-instantiable service unit may be assigned one and only one service instance. If, after all possible retries, a component cannot be instantiated, the presence state of the component is set to instantiation-failed, and the presence state of the service unit is also set to instantiation-failed. If some components are already instantiated when the service unit enters the instantiation-failed state, the Availability Management Framework terminates them. These components will enter either the uninstantiated state if they are successfully terminated or the termination-failed state if the Availability Management Framework was unable to terminate them correctly (refer also to Section 4.7 and Section 4.8).
25
30
35
40
48
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
When the first component of an already instantiated service unit becomes terminating, the service unit becomes terminating. If the Availability Management Framework fails to terminate a component, the presence state of the component is set to termination-failed and the presence state of the service unit is also set to termination-failed. When all components enter the restarting state, the service unit become restarting. However, if only some components are restarting, the service unit is still instantiated. The management of the presence state of a pre-instantiable service unit is very similar to what was previously described for a non-pre-instantiable service unit, except that a pre-instantiable service unit becomes instantiated or terminating based only on the presence state of its pre-instantiable components; when all pre-instantiable components within a pre-instantiable service unit are instantiated, the service unit becomes instantiated. If any errors occur when instantiating any of the constituent components of the service unit, the presence state of the service unit becomes instantiation-failed. Similarly, if errors occur when terminating any of the constituent components of the service unit, its presence state becomes termination-failed.
3.3.1.2 Administrative State
10
15
20
The administrative state of a service unit is an extension of the administrative state proposed by the ITU X.731 state management model ([7]). The administrative state of a service unit can be set by the system administrator. The administrative state of a service unit as well as the administrative states of the service group (see Section 3.3.5), the node (see Section 3.3.6.1), the application containing it (see Section 3.3.7), and the cluster (see Section 3.3.8) enable the Availability Management Framework to determine whether the service unit is administratively allowed to provide service. Valid values for the administrative state of a service unit are:
x
25
30
unlocked: the service unit has not been directly prohibited from taking service instance assignments by the administrator. locked: the administrator has prevented the service unit from taking service instance assignments. locked-instantiation: the administrator has prevented the service unit from being instantiated by the Availability Management Framework; the service unit is then not instantiable. shutting-down: the administrator has prevented the service unit from taking new service instance assignments and requested that existing service instance assignments be gracefully removed. When all service instances assigned to the service unit have finally been removed, its administrative state becomes locked.
35
40
AIS Specification
49
Service AvailabilityTM Application Interface Specification System Description and System Model
The administrative state of a service unit is one of the states that determine the readiness state (see Section 3.3.1.4) of that service unit. The administrative state of a service unit is persistent even when all nodes within the cluster are rebooted. The administrative state of a service unit is not directly exposed to components by the Availability Management Framework, but rather only indirectly, since the readiness state has an impact on component service instance assignments.
10
3.3.1.3 Operational State
The operational state of the service unit refers to the ITU X.731 state management model ([7]). It is used by the Availability Management Framework to determine whether a service unit is capable of taking service instance assignments. The operational state of the service unit indicates whether the components within the service unit are operable or not. Valid values for the operational state of a service unit are:
x
15
enabled: the operational state of a service unit transitions from disabled to enabled when a successful repair action has been performed on the service unit (see Section 3.12.1.4). disabled: the operational state of a service unit transitions to disabled if a component of the service unit has transitioned to the disabled state and the Availability Management Framework has taken a recovery action at the level of the entire service unit.
20
25
It is the Availability Management Framework that determines the value for the operational state of a service unit. A service unit is enabled when the node containing this service unit joins the cluster for the first time. It is set to disabled when a fail-over recovery is executed within its scope, or if its presence state is set to instantiation-failed or termination-failed. After a successful repair, it is set again to enabled by the entity performing the repair (Availability Management Framework or other entity). An administrative operation is provided to clear the disabled state of a service unit, so that an entity other than the Availability Management Framework can perform the repair and declare the service unit repaired. When a restart recovery is executed in the scope of a service unit, the restart is considered as an instantaneous combined recovery and repair action; therefore, the operational state of the service unit remains enabled in such cases. 30
35
40
50
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The operational, administrative, and presence states of a service unit, the operational state of its containing node, and the administrative states of its containing node, service group, application, and the cluster are combined into another state, called the readiness state of a service unit. This state indicates if a service unit is eligible to take service instance assignments from an administrative and health status viewpoint. This state is the only state used by Availability Management Framework to decide whether a service unit is eligible to receive service instance assignments. The readiness state of a service unit is not directly exposed to components by the Availability Management Framework, but rather only indirectly, since the readiness state has an impact on component service instance assignments. Valid values for the readiness state of a service unit are:
out-of-service
10
15
The readiness state of a non-pre-instantiable service unit is out-of-service if one or more of the following conditions are met:
x
its operational state or the operational state of its containing node is disabled; its administrative state or the administrative state of its containing service group, AMF node, application, or the cluster is either locked or lockedinstantiation; the CLM node to which the containing AMF node is mapped is not a member.
20
25
any of the preceding conditions that cause a non-pre-instantiable service unit to become out-of-service is true, or its presence state is neither instantiated nor restarting, or the service unit contains contained components, and their configured container CSI is not assigned active or quiescing to any container component on the node that contains the service unit.
30
x x
35
When the readiness state of a service unit is out-of-service, no new service instance can be assigned to it. If service instances are already assigned to the service unit at the time when the service unit enters the out-of-service state, they are transferred to other service units (if possible) and removed.
40
AIS Specification
51
Service AvailabilityTM Application Interface Specification System Description and System Model
in-service
The readiness state of a non-pre-instantiable service unit is in-service if all of the following conditions are met:
x
its operational state and the operational state of its containing node is enabled; its administrative state and the administrative states of its containing service group, AMF node, application, and the cluster are unlocked; the CLM node to which the containing AMF node is mapped is a member node.
10
x x
all of the preceding conditions that cause a non-pre-instantiable service unit to become in-service are true, and its presence state is either instantiated or restarting, and the configured container CSI of all contained components of the service unit is assigned active to at least one container component on the node that contains the service unit.
15
20
When a service unit is in the in-service readiness state, it is eligible for service instance assignments; however, it is possible that it has not yet been assigned any service instance.
stopping
25
The readiness state of a service unit is stopping if all of the following conditions are met:
x
its operational state and the operational state of its containing node is enabled, none of the administrative states of itself, the containing service group, AMF node, application, CLM node, or the cluster is locked or locked-instantiation, at least one of the administrative states of itself, the containing service group, AMF node, application, CLM node, or the cluster is shutting-down, or the container component which is handling the life cycle of contained components of the service unit has the quiescing HA state for the container CSI of the contained components, and the CLM node to which the containing AMF node is mapped is a member node.
30
35
40
52
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
When a service unit is in the stopping state, no service instance can be assigned to it, but already assigned service instances are not removed until the service unit's components indicate to do so. Table 4 shows how a pre-instantiable service unit's readiness state is derived from the operational state, the presence state, and the administrative states of itself, and the administrative states of its enclosing AMF node, service group, application, and AMF cluster. The same table applies for non-pre-instantiable service units by ignoring the Service Units Presence State column and assuming that the containing CLM node is in the cluster membership in the first two rows and regardless of whether the CLM node is or not in the cluster membership for the third row.
10
15
20
in-service
unlocked
unlocked
One or more columns contain the shutting-down state, and none is locked or locked-instantiation.
enabled
enabled
stopping
25
out-of-service
30
When a service instance is assigned to a service unit, the Availability Management Framework assigns an HA state to the service unit for that service instance. The HA state takes one of the following values:
x
35
active: the service unit is currently responsible for providing the service characterized by this service instance. standby: the service unit acts as a standby for the service characterized by this service instance. quiescing: the service unit that had previously an active HA state for this service instance is in the process of quiescing its activity related to this service instance.
40
AIS Specification
53
Service AvailabilityTM Application Interface Specification System Description and System Model
In accordance with the semantics of the shutdown administrative operations, the quiescing is performed by rejecting new users of the service characterized by this service instance while still providing the service to existing users until they all terminate using it. When no user is left for that service, the components of the service unit indicate that fact to the Availability Management Framework, which transitions the HA state to quiesced. The quiescing HA state is assigned as a consequence of a shutdown administrative operation.
x
quiesced: the service unit that had previously an active or quiescing HA state for this service instance has now quiesced its activity related to this service instance, and the Availability Management Framework can safely assign the active HA state for this service instance to another service unit. The quiesced state is assigned in the context of switch-over situations (for a description of switch-over, refer to Section 3.4). At any point of time, a service unit may have multiple service instance assignments.
10
Note:
15
The service units do not have an HA state of their own. They are assigned HA states on behalf of service instances.
Note:
20
In the remainder of the document, the usage of the terminology active or standby service units, without mentioning for which service instance(s) the service unit has been assigned a particular HA state, will be deemed legal when the context makes it obvious. This terminology is mostly applicable in scenarios in which all service instances assigned to a particular service unit share the same HA state and the service unit is incapable of sustaining a mix of HA states for the assigned service instances.
25
For simplicity of expression, the term active assignment of/for a service instance (or simply active assignment if the context makes it clear which service instance is meant) is used to mean the assignment of the active HA state to a service unit for this service instance. Similar terms are also used for the other HA states, such as standby assignment. Taking into consideration the configuration of each service group (list of service instances, list of service units, redundancy model attributes, and so on.) and the current value of the administrative and operational states of their service units and service instances, the Availability Management Framework dynamically assigns the HA state to the service units for the various service instances. Section 3.7 describes how these assignments are performed for the various redundancy models.
30
35
40
54
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Though some aspects differ from one redundancy model to another, some rules apply to all redundancy models:
x
The overall goal of the Availability Management Framework is to keep as many active assignments as requested by the configuration for all service instances (which are administratively unlocked). If a service unit that is active for a service instance goes out-of-service, the Availability Management Framework automatically assigns the active HA state to a service unit that is already standby for the service instance if there is one. In the absence of administrative operations or error recovery actions being performed, only active and (possibly) standby HA states are assigned to the service units for particular service instances.
10
3.3.2 Component States The overall state of a component is a combination of a number of underlying states. A description of these underlying states is given in the next sections.
Note:
15
No restriction exists in the applicability of various states of a component and their values described in the following subsections to proxied components. However, if the status of a proxied component changes to unproxied (typically, when its proxy component fails, and no proxy can be engaged to proxy the proxied component), the values for various states of this proxied component reflect the last know value of the corresponding states before its status became unproxied.
20
25
The presence state of a component reflects the component life cycle. It takes one of the following values:
x x x x x x x
30
35
40
AIS Specification
55
Service AvailabilityTM Application Interface Specification System Description and System Model
The presence state of a component is set to instantiating when the Availability Management Framework invokes
x
the saAmfProxiedComponentInstantiateCallback() function (see Section 7.10.2), or the saAmfContainedComponentInstantiateCallback() function (see Section 7.10.4), or the saAmfCSISetCallback() function (see Section 7.9.2), or when it executes the INSTANTIATE CLC-CLI command (see Section 4.6),
x x
10
as applicable according to Table 34 on page 383, to instantiate the component. The presence state of a component is set to instantiated when the INSTANTIATE CLC-CLI command returns successfully (only for non-proxied, non-SA-aware components) or the component is registered successfully with the Availability Management Framework (for SA-aware or proxied components). If, after all possible retries, a component cannot be instantiated, the presence state of the component is set to instantiation-failed. If some components are already instantiated or instantiating when the service unit enters the instantiation-failed state, the Availability Management Framework terminates them. These components will enter either the uninstantiated state if they are successfully terminated or the termination-failed state if the Availability Management Framework was unable to terminate them correctly. The following actions set the presence state of a component to terminating:
x
15
20
25
The Availability Management Framework invokes the x SaAmfComponentTerminateCallbackT function (see Section 7.10.1),
x x
or the saAmfCSIRemoveCallback() function (see Section 7.9.3), or it executes the TERMINATE CLC-CLI function (see Section 4.7),
30
35
The Availability Management Framework abruptly terminates the component by using one of the following interfaces, as applicable according to Table 34 on page 383: x by executing the CLEANUP CLC-CLI command (see Section 4.8),
x
40
56
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
If an instantiated component fails, the Availability Management Framework will make an attempt to restart the component, provided that restart is allowed for the component. A component is restarted by the Availability Management Framework in the context of error recovery and repair actions (for details, see Section 3.12) or in the context of a restart administrative operation (for details, see Section 9.4.7). Restarting a component means first terminating it and then instantiating it again (see Section 3.12.1.2). Two different actions shall be undertaken by the Availability Management Framework regarding the component service instances assigned to a component when the component restart is needed:
x
10
Keep the component service instances assigned to the component while the component is restarted. This action is typically performed when it is faster to restart the component than to reassign the component service instances to another component. In this case, the presence state of the component is set to restarting while the component is being terminated and until it is instantiated again (or a failure occurs). Internally, in this particular scenario, the Availability Management Framework withdraws and reassigns exactly the same HA state on behalf of all component service instances to the component as was assigned to the component for various component service instances before the restart procedure, without evaluating the various criteria that the Availability Management Framework would normally assess before making such an assignment. Reassign the component service instances currently assigned to the component to another component before terminating/instantiating the component. In this case, the presence state of the component is not set to restarting but transitions through the other presence state values (typically in the absence of failures: terminating, uninstantiated, instantiating, and then instantiated) as the component is terminated and instantiated again.
15
20
25
30
The choice between these two policies is based on the saAmfCompDisableRestart configuration attribute of each component (see the SaAmfComp object class in Section 8.13.2). When a node leaves the cluster, the Availability Management Framework sets the presence state of all components included on that node to uninstantiated, except for components that are in the instantiation-failed or termination-failed state.
35
40
AIS Specification
57
Service AvailabilityTM Application Interface Specification System Description and System Model
Table 5 shows the possible presence states of the components of a service unit for each valid presence state of the service unit:. Table 5 Presence State of Components of a Service Unit Service Unit uninstantiated instantiating Included Components uninstantiated uninstantiated instantiating instantiated restarting instantiated restarting terminating instantiated restarting uninstantiated restarting instantiation-failed uninstantiated instantiated terminating termination-failed instantiated terminating termination-failed uninstantiated
10
instantiated terminating
15
restarting instantiation-failed
20
25
termination-failed
30
35
40
58
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The operational state of a component refers to the ITU X.731 state management model (see [7]). It is used by the Availability Management Framework to determine whether a component is capable of taking component service instance assignments. The operational state indicates whether the components within the service unit are operable or not. Valid values for the operational state of a component are:
x
enabled: the Availability Management Framework is not aware of any error for this component, or a restart recovery action is in progress to recover from this error. disabled: the Availability Management Framework is aware of at least one error for this component that could not be recovered from by restarting the component or its service unit.
10
The described approach for operational state definition was chosen to reflect properly the capability of a component to be restarted within the time limits critical for the service it provides regardless the reason of the restart. The Availability Management Framework becomes aware of an error for a component in the following circumstances:
x
15
20
An error for the component is reported to the Availability Management Framework by when the API function saAmfComponentErrorReport() is invoked. Such an error can be reported by the component itself, by another component, or by a monitoring facility (see saAmfPmStart_3()). The component fails to respond to the Availability Management Framework's healthcheck request or responds with an error. The component fails to initiate a component-invoked healthcheck in a timely manner. A command used by the Availability Management Framework to control the component life cycle returned an error or did not return in time. The component fails to respond in time to an Availability Management Framework's callback. The component responds to an Availability Management Framework's state change callback (SaAmfCSISetCallbackT) with an error. If the component is SA-aware, and it does not register with the Availability Management Framework within the preconfigured time-period after its instantiation (see Section 4.6). If the component is SA-aware, and it unexpectedly unregisters with the Availability Management Framework (see Section 7.1.1).
25
30
35
40
AIS Specification
59
Service AvailabilityTM Application Interface Specification System Description and System Model
x x
The component terminates unexpectedly. When a fail-over recovery operation performed at the level of the service unit or the node containing the service unit triggers an abrupt termination of the component. For more details about recovery operations, refer to Section 3.12.1.3.
A component is enabled when the node containing it joins the cluster for the first time. It is set to disabled when the Availability Management Framework performs a failover recovery action on the component as a consequence of the component becoming faulty, or if its presence state is set to instantiation-failed or termination-failed. It is again enabled after a successful repair. When a restart recovery action is performed on a component, it is considered as an instantaneous combined recovery and repair action; therefore, the operational state of the component remains enabled in that case. It is the Availability Management Framework that determines the value for the operational state. The operational state of a component is not directly exposed to components by the Availability Management Framework API.
3.3.2.3 Readiness State
10
15
The operational state of a component is combined with the readiness state of its service unit to obtain the readiness state of the component. This state indicates whether a component is available to take component service instance assignments, and it is the only state used by the Availability Management Framework to decide whether a component is eligible to receive component service instance assignments. The readiness state of a component is defined as follows:
x
20
25
out-of-service: the readiness state of a component is out-of-service if its operational state is disabled, or the readiness state of the service unit containing it is out-of-service. When the readiness state of a component is out-of-service, no component service instance can be assigned to it. in-service: the readiness state of a component is in-service if its operational state is enabled, and the readiness state of the service unit containing it is in-service. When a component is in the in-service readiness state, it is eligible for component service instance assignments; however, it is possible that it has not yet been assigned any component service instance.
30
35
40
60
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
stopping: the readiness state of a component is stopping if its operational state is enabled, and the readiness state of the service unit containing it is stopping. When the readiness state of a component is stopping, no component service instance can be assigned to it. The standby component service instance assignments are removed immediately, but active component service instances are not removed before the component indicates to the Availability Management Framework to do so.
The following table summarizes how the readiness state of a component is derived from the component's operational state and the enclosing service unit's readiness state. Table 6 Components Readiness State Service Units Readiness State in-service stopping out-of-service in-service stopping out-of-service Components Operational State enabled enabled enabled disabled disabled disabled Components Readiness State in-service stopping out-of-service out-of-service out-of-service out-of-service
10
15
20
25
For each component service instance assigned to a component within a service unit, the Availability Management Framework assigns an HA state to the component on behalf of the component service instance. When the Availability Management Framework assigns an HA state to a service unit for a particular service instance, the action is actually translated into a set of subactions on the components contained in the service unit. These subactions consist in assigning an HA state to these components for the individual component service instances contained in the service instance. The HA state of a component for a particular component service instance takes one of the following values (identical to the HA state of a service unit for a particular service instance):
30
35
40
AIS Specification
61
Service AvailabilityTM Application Interface Specification System Description and System Model
active: the component is currently responsible for providing the service characterized by this component service instance. standby: the component acts as a standby for the service characterized by this component service instance. quiescing: the component that had previously an active HA state for this component service instance is in the process of quiescing its activity related to this service instance. In accordance with the semantics of the shutdown administrative operations, this quiescing is performed by rejecting new users of the service characterized by this component service instance while still providing the service to existing users until they all terminate using it. When no user is left for that service, the component indicates that fact to the Availability Management Framework, which transitions the HA state to quiesced. The quiescing HA state is assigned as a consequence of a shutdown administrative operation. quiesced: the component that had previously the active or quiescing HA state for this component service instance has now quiesced its activity related to this component service instance, and the Availability Management Framework can safely assign the active HA state for this component service instance to another component. The quiesced state is assigned in the context of switch-over situations (for a description of switch-over, refer to Section 3.4).
10
15
20
As the sub-actions involved to change the HA state of individual components of the service unit will not complete at the same time, the HA state of a service unit for a service instance and the HA state of individual components for the component service instances contained in that service instance may differ. The following table describes the possible combinations. Note that the occurrence of the states active, standby, quiescing, and quiesced, in this order, in a row at the component or component service instance level (second column), determines the state in the same row at the service unit or service instance level (first column). So, if the state active appears in a row at the component or component service instance level, the state in the same row at the service unit or service instance level is active. If a row at the component or component service instance level shows no active but rather a standby state, the state of the same row at the service unit or service instance level is standby. The same applies similarly for the quiescing and quiesced states.
25
30
35
40
62
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
1 Table 7 HA State of Component/Component Service Instance HA State of Service Unit/ Service Instance active HA State of Component/ Component Service Instance active quiescing quiesced (not assigned) active standby (not assigned) quiescing quiesced (not assigned) quiesced (not assigned) standby quiesced (not assigned) (not assigned) 25 15 5
10
active
quiescing
quiesced standby
20
(not assigned)
The first two rows of the previous table are used to identify the two possible but mutually exclusive combinations of HA state of components when the HA state of the service unit is active. The second row is specific for a transition of the HA state of the service unit from standby to active. For simplicity of expression, the term active assignment of/for a component service instance (or simply active assignment if the context makes it clear which component service instance is meant) is used to mean the assignment of the active HA state to a component for this component service instance. Similar terms are also used for the other HA states, such as standby assignment. When the Availability Management Framework assigns the active HA state to a component on behalf of a component service instance, the component must start to provide the service that is characterized by that component service instance.
30
35
40
AIS Specification
63
Service AvailabilityTM Application Interface Specification System Description and System Model
When the Availability Management Framework assigns the standby HA state to a component on behalf of a component service instance, the component must prepare itself for a quick and smooth transition into the active HA state for that component service instance if requested by the Availability Management Framework. How the standby component prepares itself for this transition is very dependent on its implementation and may involve, for example, actions such as sharing access to checkpointed data with the active component. In switch-over situations (see Section 3.4), when the Availability Management Framework assigns the quiesced HA state to a component on behalf of a component service instance, the component must, as quickly as possible, get the work related to that component service instance into such a state that the work can be transferred to another component with as minimal service disruption as possible. This may mean different things depending on the nature of the work and the implementation of the component. Typically, the component should not take in new work related to the component service instance. For example, if work related to the service instance is delivered in the form of messages sent to a specific message queue, the component should stop retrieving messages from that queue. Work which is related to that component service instance and which is already in progress inside the component, should be checkpointed, so that it can be completed later on by the component that will take over. If the component or the way it interacts with its clients does not support checkpointing of on-going work, the work needs to either be completed immediately or an indication returned to the client indicating that it should submit that work later. If the component maintains some state associated with the component service instance, that state needs to be made available to the component that will take over the activity. Depending on the implementation of the component, this may imply, for example, writing the state in persistent storage or in a checkpoint, or packing it in a message and sending it to a particular message queue. As a consequence of a shutdown administrative operation (see Section 9.4.6 on page 329), when the Availability Management Framework assigns the quiescing HA state to a component on behalf of a component service instance, the component must reject attempts from new users to access the service characterized by the component service instance and only continue to service existing users. When all users have terminated using the service corresponding to that component service instance, the component must notify this termination to the Availability Management Framework by invoking the saAmfCSIQuiescingComplete() function. The invocation of the saAmfCSIQuiescingComplete() function implicitly transitions the HA state of the component from quiescing to quiesced for that component service instance.
10
15
20
25
30
35
40
64
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The Availability Management Framework performs the following actions when it assigns the active HA state to a service unit for a particular service instance:
x
It invokes the saAmfCSISetCallback() callback of all SA-aware components for the components themselves. It invokes the saAmfCSISetCallback() callback of their proxy components for all proxied components. If the proxied component is a non-pre-instantiable component and is not already instantiated, the proxy instantiates the proxied component as part of performing the component service instance assignment. It runs the INSTANTIATE command for non-proxied, non-SA-aware components.
10
The Availability Management Framework performs the following actions regarding components when it assigns to a service unit an HA state other than active for a particular service instance:
x
15
It invokes the saAmfCSISetCallback() callback of all SA-aware components for the components themselves. For the special case of a container CSI of this particular service instance for which the HA state of the container component was active, the Availability Management Framework performs the following actions, before it invokes the saAmfCSISetCallback() callback to set the new HA state of the container component for the container CSI.
x
20
for each associated contained component and for each of its component service instances that has the active HA state and needs to be quiesced, the Availability Management Framework sets the HA state of the associated contained component to quiescing, if the change of the HA state of the container CSI was caused by a shutdown administrative operation on a service unit or on any entity containing the service unit; otherwise, it is set to quiesced; the Availability Management Framework waits for each associated contained component to quiesce for its component service instances (if the setting of the HA state to quiescing or quiesced was necessary), then it removes all component service instances assigned to the contained component and terminates it.
25
30
It invokes the saAmfCSISetCallback() callback of their proxy components for all proxied components. The proxy component terminates its non-pre-instantiable proxied components as part of performing the component service instance assignment. It runs the TERMINATE command for non-proxied, non-SA-aware components.
35
40
AIS Specification
65
Service AvailabilityTM Application Interface Specification System Description and System Model
The Availability Management Framework performs the following actions regarding components when it removes a service instance assignment from a service unit:
x
It invokes the saAmfCSIRemoveCallback() callback of all SA-aware components for the components themselves. For the special case of a container CSI of this particular service instance for which the HA state of the container component was active, the Availability Management Framework performs the following actions, before it invokes the saAmfCSIRemoveCallback() callback to remove the active HA state from the container component for the container CSI.
x
10
for each associated contained component and for each of its component service instances that has the active HA state and needs to be quiesced, the Availability Management Framework sets the HA state of the associated contained component to quiesced; the Availability Management Framework waits for each associated contained component to quiesce for its component service instances (if the setting of the HA state to quiesced was necessary), then it removes all component service instances assigned to the contained component and terminates it. 15
It invokes the saAmfCSIRemoveCallback() callback of their proxy components for all proxied components. The proxy component terminates its non-preinstantiable proxied components as part of removing the component service instance assignment. It runs the TERMINATE command for non-proxied, non-SA-aware components.
20
25 The instantiation of proxied, non-pre-instantiable components is performed by the proxy as part of the assignment of component service instances to the proxied component. Similarly, the termination of proxied, non-pre-instantiable components is performed by the proxy as part of the removal of component service instances from the proxied component. Hence, the Availability Management Framework never invokes the saAmfProxiedComponentInstantiateCallback() and saAmfComponentTerminateCallback() callback functions of the proxy for proxied, non-pre-instantiable components. During an individual component restart induced by a fault encountered by the component, the component remains enabled. Its readiness state can change according to changes in its presence state (as described in Section 3.3.2.1), and it is the readiness state that determines the Availability Management Framework's actions regarding the CSI assignments to the component.
30
35
40
66
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
ADD
active
RMV
quiescing
RMV
10
quiesced
RMV
15
RMV
ADD
standby
ADD Transitions: saAmfCSISetCallback(SA_AMF_CSI_ADD_ONE) RMV Transitions: saAmfCSIRemoveCallback(), saAmfTerminateCallback(), cleanup operation (see Table 34 in Appendix A) Other Transitions: saAmfCSISetCallback(SA_AMF_CSI_TARGET_*)
20
25 Table 8 shows combinations of the readiness state and the HA state for pre-instantiable components for a component service instance. Only the HA state is exposed to application developers. Table 8 Application Developer View for Pre-Instantiable Components Components Readiness State Components HA state for a Component Service Instance active standby quiescing quiesced standby quiescing quiesced [no HA state] 35 30
in-service
stopping
40
out-of-service
AIS Specification
67
Service AvailabilityTM Application Interface Specification System Description and System Model
Table 9 shows combinations of the readiness state and the HA state for non-preinstantiable components for a component service instance. Only the HA state is exposed to application developers. Table 9 Application Developer View for Non-Pre-Instantiable Components Components Readiness State in-service out-of-service 3.3.3 Service Instance States
3.3.3.1 Administrative State
Components HA state for a Component Service Instance active or no HA state 10 [no HA state]
15
The administrative state of a service instance is manipulated by the system administrator. Valid values for the administrative state of a service instance are:
x
unlocked: HA states can be assigned to service units on behalf of the service instance. locked: no HA state can be assigned to service units on behalf of the service instance. shutting-down: the service instance is shutting down gracefully. This means that all assignments of all its component service instances are quiescing or quiesced assignments.
20
25
The administrative state of a service instance is not directly exposed to components by the Availability Management Framework API. The administrative state of a service instance is persistent even when all nodes within the cluster are rebooted.
Note:
30
The administrative state value of locked-instantiation is not a valid state value for a service instance, as a service instance cannot be terminated and made non-instantiable as other logical entities may be.
35
40
68
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The assignment state of a service instance indicates whether the service represented by this service instance is being provided or not by some service unit. Valid values for the assignment state of a service instance are:
x
unassigned: a service instance is said to be unassigned if no service unit has the active or quiescing HA state for this service instance. fully-assigned: a service instance is said to be fully-assigned if and only if x the number of service units having the active or quiescing HA state for the service instance is equal to the preferred number of active assignments for the service instance, which is defined in the redundancy model of the corresponding service group (see Section 3.7), and
x
10
the number of service units having the standby HA state for the service instance is equal to the preferred number of standby assignments for the service instance, which is defined in the redundancy model of the corresponding service group (see Section 3.7).
15
partially-assigned: a configured service instance that is neither unassigned nor fully-assigned is said to be partially-assigned.
20
The following table shows the preferred number of active and standby assignments, for various redundancy models (additionally, refer to Section 3.7): Table 10 Preferred Number of Active and Standby Assignments Redundancy Model 2N N+M N-Way N-Way active no-redundancy 1 1 1 as configured in the service group 1 Preferred Number of Active Assignments 1 1 as configured in the service group 0 0 35 30 Preferred Number of Standby Assignments 25
It is the Availability Management Framework that determines the value of the assignment state. The assignment state of a service instance is not directly exposed to components by the Availability Management Framework API. 40
AIS Specification
69
Service AvailabilityTM Application Interface Specification System Description and System Model
When a service instance enters the unassigned state, an alarm will be issued. For other changes in the assignment state, appropriate notifications will be issued. (see Chapter 11). 3.3.4 Component Service Instance States The Availability Management Framework does not define any states for a component service instance; instead states are defined for the service instance to which this component service instance pertains. 3.3.5 Service Group States The only state defined by the Availability Management Framework for service groups is the administrative state. It can be manipulated by the system administrators and is an extension of the administrative state proposed by the ITU X.731 state management model ([7]). Valid values for the administrative state of a service group are:
x
10
15
unlocked: the service group has not been directly prohibited from providing service by the administrator. locked: the service group has been administratively prohibited from providing service. locked-instantiation: the administrator has prevented all service units of the service group from being instantiated by the Availability Management Framework. shutting-down: the administrator has prevented all service units contained within the service group from taking new service instance assignments and requested that existing service instance assignments be gracefully removed. When all service instances assigned to all the service units within the service group have finally been removed, the administrative state of the service group transitions to locked, that is, the administrative state of the service group is locked after completion of the shutting down operation.
20
25
30
The Availability Management Framework uses the administrative state of the service group to determine the readiness state of the service units of the service group, as described in Section 3.3.1.4. The administrative state of a service group is persistent when all nodes within the cluster are rebooted. The administrative state of a service group is not directly exposed to components by the Availability Management Framework, but rather only indirectly, since the readiness state of the service unit has an impact on component service instance assignments.
35
40
70
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Note:
Though a service group has no associated HA state, this specification uses the term assign a service instance to a service group, meaning that the service instance is assigned to one or more service units of the service group.
The administrative state of a node is an extension of the administrative state proposed by the ITU X.731 state management model ([7]). The administrative state of a node can be set by the system administrator. Valid values for the administrative state of a node are:
x
10
unlocked: the node has not been directly prohibited from providing service by the administrator. locked: the node has been administratively prohibited from providing service. locked-instantiation: the administrator has prevented all service units of the node from being instantiated by the Availability Management Framework. Thus, all service units within the node are not instantiable. shutting-down: the administrator has prevented all service units contained within the node from taking new service instance assignments and requested that existing service instance assignments be gracefully removed. When all service instances assigned to all the service units within the node have finally been removed, the administrative state of the node transitions to locked, that is, the administrative state of the node is locked after completion of the shutting down operation. 15
x x
20
25
The Availability Management Framework uses the administrative state of the node to determine the readiness state of the service units of the node, as described in Section 3.3.1.4. The administrative state of a node is persistent even when all nodes within the cluster are rebooted. , The administrative state of a node is not directly exposed to components by the Availability Management Framework, but rather only indirectly, since the readiness state of the service unit has an impact on component service instance assignments.
30
35
40
AIS Specification
71
Service AvailabilityTM Application Interface Specification System Description and System Model
The operational state of the node refers to the ITU X.731 state management model ([7]). It is used by the Availability Management Framework to determine whether a service unit within the node is capable of taking service instance assignments. The operational state of the node indicates whether the service units within the node are operable or not. Valid values for the operational state of a node are:
x
enabled: the operational state transitions from disabled to enabled when a successful repair action has been performed on the node (see Section 3.12.1.4). disabled: the operational state of a node transitions to disabled if a component of the node has transitioned to the disabled state and the Availability Management Framework has taken a recovery action at the level of the entire node (node switch-over, fail-over, or failfast).
10
15
The operational state of a node is enabled when the node joins the cluster for the first time. It is set to disabled when the Availability Management Framework performs a node-level recovery action. After a successful repair, the operational state of the node is set again to enabled by the entity performing the repair (Availability Management Framework or other entity). An administrative operation is provided to clear the disabled state of a node, so that an entity different from Availability Management Framework may perform the repair and declare the node repaired. The Availability Management Framework uses the operational state of the node to determine the readiness state of the service units of the node, as described in Section 3.3.1.4. The operational state of a node is valid even after a node left the membership, since it is used to provide the information if the node was healthy or had a failure when leaving. The following explains the state transitions in detail: The operational state of a node is not directly exposed to components by the Availability Management Framework, but rather only indirectly, since the readiness state of the service unit has an impact on component service instance assignments. If a node is enabled and in the locked-instantiation administrative state when it leaves the cluster membership, the node stays enabled until it joins the cluster again. If a node is enabled and not in the locked-instantiation administrative state when it leaves the cluster membership, the node becomes disabled while it is out of the cluster and becomes enabled again when it rejoins the cluster.
20
25
30
35
40
72
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
If a disabled node with the automatic repair attribute (see Section 3.12.1.4) turned on unexpectedly leaves the cluster membership, the Availability Management Framework should assess the state of the node when the node rejoins the cluster membership to ascertain if it needs to proceed with the planned repair action that was potentially interrupted when the node unexpectedly left the cluster membership. If a disabled node with the automatic repair attribute turned off leaves the cluster membership, the operational state of the node (and of its contained entities) is not modified when the node joins the cluster again. Note that the operational state of the node may have been reenabled by an SA_AMF_ADMIN_REPAIR administrative operation before the node rejoined the cluster, in which case the node becomes enabled upon rejoining the cluster. 3.3.7 Application States The only state defined by the Availability Management Framework for an application is the administrative state. It can be manipulated by the system administrator and is an extension of the administrative state proposed by the ITU X.731 state management model ([7]). Valid values for the administrative state of an application are:
x
10
15
unlocked: the application has not been directly prohibited from providing service by the administrator. locked: the application has been administratively prohibited from providing service. locked-instantiation: the administrator has prevented all service units of the application from being instantiated by the Availability Management Framework. shutting-down: the administrator has prevented all service units contained within the application from taking new service instance assignments and requested that existing service instance assignments be gracefully removed. When all service instances assigned to all the service units within the application have finally been removed, the administrative state of the application transitions to locked, that is, the administrative state of the application is locked after completion of the shutting down operation.
20
25
30
The Availability Management Framework uses the administrative state of the application to determine the readiness state of the service units of the application, as described in Section 3.3.1.4. The administrative state of an application is persistent, even when all nodes within the cluster are rebooted. The administrative state of an application is not directly exposed to components by the Availability Management Framework, but rather only indirectly, since the readi-
35
40
AIS Specification
73
Service AvailabilityTM Application Interface Specification System Description and System Model
ness state of the service unit has an impact on component service instance assignments. 3.3.8 Cluster States The only state defined by the Availability Management Framework for a cluster is the administrative state. It can be manipulated by the system administrator and is an extension of the administrative state proposed by the ITU X.731 state management model ([7]). Valid values for the administrative state of a cluster are:
x x x
unlocked: the cluster has been administratively allowed to provide service. locked: the cluster has been administratively prohibited from providing service. locked-instantiation: the administrator has prevented all service units of the cluster from being instantiated by the Availability Management Framework. Thus, all service units within the cluster are not instantiable. shutting-down: the administrator has prevented all service units contained within the cluster from taking new service instance assignments and requested that existing service instance assignments be gracefully removed. When all service instances assigned to all the service units within the cluster have finally been removed, the administrative state of the cluster transitions to locked, that is, the administrative state of the cluster is locked after completion of the shutting down operation.
10
15
20
The Availability Management Framework uses the administrative state of the cluster to determine the readiness state of the service units of the cluster, as described in Section 3.3.1.4. The administrative state of a cluster is persistent across the reboot of the cluster. The administrative state of a cluster is not directly exposed to components by the Availability Management Framework, but rather only indirectly, since the readiness state of the service unit has an impact on component service instance assignments.
25
30
35
40
74
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
3.3.9 Summary of States Supported for the Logical Entities Table 11 summarizes the states that the Availability Management Framework supports for the logical entities of the system model.
5 Table 11 Summary of States Supported for the Logical Entities Logical Entity cluster application service group node service unit component service instance component service instance States administrative administrative administrative administrative, operational administrative, operational, readiness, HA, presence operational, readiness, HA, presence administrative, assignment 20 15 10
The administrative states of service units, service groups, service instances, nodes, applications, and the cluster are completely independent in the sense that one does not affect the other. As an example, a service unit might be administratively unlocked while its enclosing node is locked. Whether the service unit is actually administratively prevented from providing service or not depends on the administrative state of the service unit and on the administrative states of its containing node, service group, application, and the cluster. The corresponding rules are given in Section 3.3.1.2. Note that the administrative, presence, and operational states of a particular entity typically do not have a direct impact on each other. However, certain incidents may change more than one of these states, as explained next:
x
25
30
A service unit failure can lead to its presence state changing to uninstantiated and its operational state changing to disabled. This incident is an example of an event that changes both the operational and presence states. When a service unit is administratively terminated (refer to Section 9.4.4 on page 325), its presence state changes to uninstantiated, and its administrative state changes to locked-instantiation, but its operational state remains unchanged. Thus, this event changes both administrative and presence states, but not the operational state.
35
40
AIS Specification
75
Service AvailabilityTM Application Interface Specification System Description and System Model
10
15
20
25
30
35
40
76
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
10
15 The service unit and all its enclosing entities are in the unlocked administrative state.
x
One or more of the entities service unit and its enclosing entities are in the locked administrative state and neither the service unit nor any of its enclosing entities are in the locked-instantiation administrative state. One or more of the entities service unit and its enclosing entities are in the locked-instantiation administrative state and the service unit and all its enclosing entities that are not in the locked-instantiation state are either in the unlocked or locked administrative states. One or more of the entities service unit and its enclosing entities are in the shutting-down administrative state and the service unit and all enclosing entities that are not in the shutting-down state are in the unlocked administrative state.
20
"locked-instantiation"
25
"shutting-down"
30
The terms Operational, Presence, Readiness, and HA State in the heading of Table 13 refer to the respective states of a service unit. The operational state of the node hosting the service unit is not shown in this table, but its effect is as follows: unless otherwise stated in footnotes to table rows, all rows in the table apply if the operational state of the node is enabled. If its operational state is disabled, only the rows containing disabled in the second column apply, irrespective of whether the operational state of the service unit is enabled or disabled.
35
40
AIS Specification
77
Service AvailabilityTM Application Interface Specification System Description and System Model
1 Table 13 Combined States for Pre-Instantiable Service Units Combined Administrative State from Table 12 locked Operational enabled Presence uninstantiated instantiating instantiated restarting uninstantiated instantiation-failed terminating termination-failed instantiated restarting uninstantiated instantiating terminating uninstantiated instantiation-failed terminating termination-failed instantiated restarting instantiating terminating uninstantiated Readiness out-of-service HA [no HA state] 10 out-of-service [no HA state] 15 in-service out-of-service any [no HA state] 20 5
locked
disabled
unlocked unlocked
enabled enabled
unlocked
disabled
out-of-service
[no HA state] 25
30
35
40
78
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Table 13 Combined States for Pre-Instantiable Service Units (Continued) Combined Administrative State from Table 12 shutting-down Operational disabled Presence uninstantiated instantiation-failed instantiating instantiated1 restarting terminating termination-failed uninstantiated terminating uninstantiated instantiation-failed terminating termination-failed Readiness out-of-service HA
5 [no HA state]
10
locked-instantiation locked-instantiation
enabled disabled
out-of-service out-of-service
15
20
1. This combination of states applies only if the node hosting the service unit is disabled while the service unit itself is still enabled.
Reasons for a service unit to move from one combination of states to another:
x x
lock, lock-instantiation, shutdown, unlock-instantiation or unlock operation, failure of a component contained in the service unit, which escalates to disabling the containing service unit, and thus cleaning up and uninstantiating the service unit, repair of a failed service unit (by restarting the service unit or by rebooting the node, or by executing an SA_AMF_ADMIN_REPAIRED administrative operation), and all components contained in the service unit leaving the SA_AMF_HA_QUIESCING state for all their component service instances (labeled with "Stopped" in the next diagram); concerning service units containing contained components, the following additional reasons: termination or restart of the associated container component, administrative lock or shutdown of a service instance containing the corresponding container CSI, or administrative lock or shutdown of the service unit containing the associated container component.
25
30
35
40
AIS Specification
79
Service AvailabilityTM Application Interface Specification System Description and System Model
Some of the important state transitions for a pre-instantiable component are shown next in FIGURE 5. The following simplifications were made in this figure:
x x
only the presence state instantiated and uninstantiated are considered; the states shown as locked, unlocked, and so on refer to the administrative state of the service unit; it is assumed that the enclosing entities are all in the unlocked administrative state; for the transitions among states, only Instantiate and administrative operations are considered (Lock, Unlock, and so on), and they apply only to the service unit; the operational state of the node hosting the service unit is enabled. 10 5
15
20
25
30
35
40
80
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
1
FIGURE 5 State Transitions for Pre-Instantiable Service Units
Shutdown unlocked, enabled, uninstantiated Unlock Instantiate Instantiate Lock locked, enabled, uninstantiated
10
Lock Instantiation
15
20
Shutdown Failure Repair Unlock shutting-down, enabled, instantiated Failure Failure Unlock unlocked, disabled, uninstantiated Shutdown Lock locked, disabled, uninstantiated Lock Instantiation locked-instantiation, disabled, uninstantiated Unlock Instantiation Lock Failure Failure Repair Failure Repair
25
30
35
40
AIS Specification
81
Service AvailabilityTM Application Interface Specification System Description and System Model
3.5.2 Combined States for Non-Pre-Instantiable Service Units Table 14 and FIGURE 6 show the possible combinations of states for non-pre-instantiable service units. The terms Operational, Presence, Readiness, and HA State in the heading of Table 14 refer to the respective states of a service unit. The operational state of the node hosting the service unit is not shown in this table, but its effect is as follows: all rows in the table apply if the operational state of the node is enabled. If its operational state is disabled, only the rows containing disabled in the second column apply, irrespective of whether the operational state of the service unit is enabled or disabled. Table 14 Combined States for Non-Pre-Instantiable Service Units Combined Administrative State from Table 12 locked Operational enabled Presence uninstantiated instantiating restarting terminating uninstantiated instantiation failed terminating termination failed uninstantiated instantiating instantiated restarting uninstantiated instantiation failed terminating termination failed instantiating instantiated restarting uninstantiated terminating Readiness out-of-service HA [no HA state]
10
15
20
locked
disabled
out-of-service
[no HA state] 25
unlocked unlocked
enabled enabled
in-service in-service
unlocked
disabled
out-of-service
[no HA state] 35
shutting-down
enabled
stopping
quiescing 40
shutting-down
enabled
stopping
[no HA state]
82
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Table 14 Combined States for Non-Pre-Instantiable Service Units (Continued) Combined Administrative State from Table 12 shutting-down Operational disabled Presence uninstantiated instantiation-failed terminating termination-failed uninstantiated uninstantiated instantiation failed terminating termination failed Readiness out-of-service HA
5 [no HA state]
locked-instantiation locked-instantiation
enabled disabled
Reasons for a service unit to move from one combination of states to another:
x x
lock, lock-instantiation, shutdown, unlock-instantiation, or unlock operation, failure of a component contained in the service unit, which escalates to disabling the containing service unit, and thus cleaning up and uninstantiating the service unit, service unit uninstantiated by the Availability Management Framework, and service unit instantiated by the Availability Management Framework.
20
x x
25
Some of the important state transitions for a non-pre-instantiable component are shown in FIGURE 6. The following simplifications were made in this figure:
x x
30
only the presence state instantiated and uninstantiated are considered; the states shown as locked, unlocked, and so on refer to the administrative state of the service unit; it is assumed that the enclosing entities are all in the unlocked administrative state; for the transitions among states, only Instantiate, Terminate, and administrative operations are considered (Lock, Unlock, and so on), and they apply only to the service unit; the operational state of the node hosting the service unit is enabled; no transitions and states induced by the Shut-down administrative operation are shown.
35
x x
40
AIS Specification
83
Service AvailabilityTM Application Interface Specification System Description and System Model
1
FIGURE 6 State Transitions for Non-Pre-Instantiable Service Units
10
15
unlocked, Failure Repair enabled, instantiated Failure Repair Failure Repair
20
Failure
25
Unlock unlocked, disabled, uninstantiated locked, disabled, uninstantiated Lock Lock Instantiation locked-instantiation, disabled, uninstantiated
30
Unlock Instantiation
35
40
84
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
10
x_active_and_y_standby: for a certain component service type, the component supports all values of the HA state, and it can have the active HA state for x component service instances and the standby HA state for y component service instances at a time. x_active_or_y_standby: for a certain component service type, the component supports all values of the HA state. It can be assigned either the active HA state for x component service instances or the standby HA state for y component service instances at a time. 1_active_or_y_standby: for a certain component service type, the component supports all values of the HA state. It can be assigned either the active HA state for only one component service instance or the standby HA state for y component service instances at a time. 1_active_or_1_standby: for a certain component service type, the component supports all values of the HA state, and it can be assigned either the active HA state or the standby HA state for only one component service instance at a time. x_active: for a certain component service type, the component cannot be assigned the standby HA state for component service instances, but it can be assigned the active HA state for x component service instances at a time. 1_active: for a certain component service type, the component cannot be assigned the standby HA state for component service instances, but it can be assigned the active HA state for only one component service instance at a time. non-pre-instantiable: for a certain component service type, the component provides service as soon as it is started. The Availability Management Framework delays the instantiation of the component to the time when the component is assigned the active HA state on behalf of a component service instance. When the active HA state for a component service instance is removed from the component, the Availability Management Framework terminates the component. Such a component is termed non-pre-instantiable.
15
20
25
30
35
40
AIS Specification
85
Service AvailabilityTM Application Interface Specification System Description and System Model
Service units may hold components supporting different capability models. The number of service instances assigned to a service unit depends on the number of component service instances supported by the components included in the service unit per component service type.
10
15
20
25
30
35
40
86
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
10
15
20
25
30
35
AIS Specification
87
Service AvailabilityTM Application Interface Specification System Description and System Model
These service group redundancy models are not exposed in the APIs of this specification. Note that the N in the 2N model refers to the number of service groups, whereas N and M, when used in the other models, refer to service units. This usage of N in the 2N model is due to common usage of the term 2N to refer to 1+1 active/ standby redundancy configurations, which can be repeated N times. Each redundancy model and the common characteristics of all or most of the redundancy models are explained in the following sections. Section 3.7.7 on page 152 describes the effect of administrative operations on the redundancy models.
The following description uses several ordered lists like ordered list of service units or ordered list of service instances. The order of the elements in the list is based on the relative importance of these elements. The terms rank or ranking are used as synonyms to this order. Similarly, ranked list is also used as a synonym to ordered list.
15
The following definitions and concepts are common to all the supported redundancy models.
x
20
Instantiable service units: these service units have the following characteristics: (*) configured in the Availability Management Framework; (*) contained in a node that is currently a member of the cluster whose operational state is enabled; (*) the presence state of the service unit is uninstantiated, and its operational state is enabled. in-service service units: these are the service units that have a readiness state of either in-service or stopping. Instantiated service units: in the context of this discussion, these are service units with the presence state of either instantiated, instantiating, or restarting. When the Availability Management Framework intends to select service units to be in the "instantiated service units" list, it chooses these service units from the instantiable service units that are not administratively locked at any of the levels service unit, containing node, service group, application, and the cluster. This selection is done according to the service unit rank defined for the particular redundancy models. The notion of preferred number of in-service service units is defined later for each redundancy model. See, for instance, Section 3.7.2.2. Note that the instantiable and instantiated sets are disjoint.
25
30
35
40
88
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Assigned service units: these are the service units that have at least one SI assigned to them. At runtime, this number is the value of the saAmfSGNumCurrAssignedSUs runtime attribute of the saAmfSG object class (see Section 8.9). If the Availability Management Framework needs to choose a service unit for assignment from the instantiated service units list, it has to choose from the in-service instantiated service units. Instantiated and non-instantiated spare service units: all instantiated but unassigned service units are called instantiated spare service units, or simply spare service units. All non-instantiated service units of a service group are named non-instantiated spare service units. At runtime, these numbers of spare service units are the values of the saAmfSGNumCurrInstantiatedSpareSUs and saAmfSGNumCurrNonInstantiatedSpareSUs runtime attributes of the saAmfSG object class (see Section 8.9). Ordered list of service units for a service group: for each service group, an ordered list of service units defines the rank of the service unit within the service group. This rank is configured by setting the saAmfSURank attribute of the saAmfSU object class (see Section 8.10). The rank is represented by a positive integer. The lower the integer value, the higher the rank. The size of the list is equal to the number of service units configured for the service group. This ordered list is used to specify the order in which service units are selected to be instantiated. This list can also be used to determine the order in which a service unit is selected for SI assignments when no other configuration parameter defines it. It is possible that this list has only one service unit. However, to maintain the availability of the service provided by the service group, the list should include at least two service units. Default value: no default, the order is implementation-dependent. Reduction Procedure: the configuration of a service group describes optimal assignments if the preferred number of its service units can actually be instantiated during the cluster start-up. If a service unit or a node fails to instantiate during cluster start-up or is administratively taken out of service, a reduction procedure is described in most of the redundancy models. The Availability Management Framework uses this reduction procedure to compute less optimal assignments before actually starting to assign the service instances. No spare HA state: as spare service units have no SI assigned to them, no "spare" HA state is defined for service units and components on behalf of service instance and component service instances respectively. Hence, protection groups do not contain components of the spare service units, and so no changes need to be tracked for these components. Auto-adjust option: this notion indicates that it is required that the SI assignments to the service units in the service group are transferred back to the most
10
15
20
25
30
35
40
AIS Specification
89
Service AvailabilityTM Application Interface Specification System Description and System Model
preferred SI assignments in which the highest-ranked available service units are assigned the active or standby HA states for those SIs. The auto-adjust option is configured by setting the saAmfSGAutoAdjust attribute of the saAmfSG object class (see Section 8.9). If the auto-adjust option is not set, the HA assignments to service units are kept unchanged even when a higher-ranked service unit becomes eligible to take assignments (for example, when a new node joins the cluster). For details when the auto-adjust option is initiated, refer to Section 3.7.1.3. The following definitions are used in most, but not all, of the supported redundancy models.
x
10
Multiple (ranked) standby assignments: for some redundancy models, it is possible that multiple service units are assigned the standby HA state for a given SI. These service units are termed the standby service units for this given SI. The standby service units are ranked, meaning that one service unit will be considered standby #1, another one standby #2, and so on. The rank is represented by a positive integer. The lower the integer value, the higher the rank. The standby service unit with the highest rank will be assigned the active HA state for a given service instance if the service unit that is currently active for that service instance fails. The rank of a standby service unit for an SI is configured by setting the saAmfRank attribute of a service unit identified by safRankedSu in the SaAmfSIRankedSU association class (see Section 8.11). When the Availability Management Framework assigns component service instances to a component, it notifies the component about the rank of its standby assignment. This additional information can be used for the component in preparing itself for the standby role. Ordered list of SIs: this ordered list is used to rank the SIs based on their importance. The rank of an SI is configured by setting the saAmfSIRank attribute of the saAmfSI object class (see Section 8.11). The rank is represented by a positive integer. The lower the integer value, the higher the rank. The Availability Management Framework uses this ranking to choose SIs to either support with less than the wanted redundancy or drop them completely if the set of instantiated service units does not allow full support of all SIs. Redundancy level of a Service Instance: the redundancy level is the number of service units being assigned an HA state for this service instance.
15
20
25
30
35
Though most redundancy models are applicable to service groups containing nonpre-instantiable service units (see Table 15 in Section 3.8), the description provided in the following sections only apply to service groups with pre-instantiable service units, as they lead to more complex situations. The behavior of the various redundancy models for service groups with non-pre-instantiable service units can be deduced
40
90
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
from the following descriptions by taking into account the following restrictions attached to service groups with non-pre-instantiable service units:
x x x x
no spare service units, no standby service units, one and only one SI assignment per in-service service unit, and the three sets of instantiated service units, in-service service units, and active service units are identical. 5
3.7.1.2 Usage of Nodes and Node Groups for Configuring Service Units and Service Groups
10
Service groups and service units have an optional node group configuration attribute. A node group just contains a list of nodes. Service units have an optional configuration attribute (saAmfSUHostNodeOrNodeGroup in the SaAmfSU object class, shown in Section 8.10), which can either represent a node or a node group. The service unit can only be instantiated on the node (if a node is specified) or on one of the nodes of the node group (if a node group is configured). The Availability Management Framework maps each service unit onto a node at the time the service unit is introduced to the cluster (that is, at cluster startup or when the service unit is added to the configuration), and this mapping persists until the service unit is removed from the configuration, or the cluster is restarted. In other words, the node group does not provide an additional level of protection against node failures. When the Availability Management Framework decides to instantiate a local service unit, according to the pertinent redundancy model, it performs the following checks:
1.
15
20
25
If a node is configured for the service unit, the service unit will be instantiated on this node. If instead a node group is configured for the service unit, the Availability Management Framework selects a node from the node group using an implementationspecific policy to instantiate the service unit on this node. If no node or node group is configured for the service unit, the Availability Management Framework checks whether a node group is configured for the service group (saAmfSGSuHostNodeGroup attribute in the SaAmfSG object class, shown in Section 8.9). If a node group is configured for the service group, the Availability Management Framework selects a node from the node group using an implementation-specific policy to instantiate the service unit on this node.
30
2.
3.
35
4.
40
AIS Specification
91
Service AvailabilityTM Application Interface Specification System Description and System Model
5.
If no node group has been configured for the service group, the Availability Management Framework selects any node using an implementation-specific policy to instantiate the service unit on it.
If node groups are configured for both the service units of a service group and the service group, the nodes contained in the node group for the service unit can only be a subset of the nodes contained in the node group for the service group. If a node is configured for a service unit, it must be a member of the node group for the service group, if configured. It is an error to define the saAmfSUHostNodeOrNodeGroup attribute for an external service unit. It is also an error to define the saAmfSGSuHostNodeGroup attribute if a service group contains only external service units. Section 6.1.5 provides additional notes on the configuration of node and node groups for service units containing contained components and for the service groups containing these service units in order to align with the configuration of nodes and node groups for service units containing container components and for the service groups containing these service units.
3.7.1.3 Initiation of the Auto-Adjust Procedure for a Service Group
10
15
20
If a service group is configured with the auto-adjust option set, that is, the saAmfSGAutoAdjust configuration attribute is set to SA_TRUE (see the SaAmfSG object class in Section 8.9), the Availability Management Framework should attempt to return the assignments of the service group back to the most preferred assignments (as defined in Section 3.7.1.1) as soon as possible. In general, the need for auto-adjustment for a service group arises when one of the following happens:
x x
25
A service unit configured for the service group becomes instantiable. The readiness state of a service unit configured for the service group becomes in-service. A locked service instance configured for the service group becomes unlocked. 35 30
When a service group becomes eligible for auto-adjustment, the Availability Management Framework can initiate the auto-adjust procedure for that service group immediately. This seems practical when an administrative action has made the service group eligible for the auto-adjust (for example, when a service instance is unlocked by the administrative operation). However, if the completion of a recovery/repair operation has made the service group eligible for auto-adjustment (for example, if a node joins the cluster after the repair), it is not so wise to run the auto-adjust procedure for the service group involving the newly-repaired service units immediately. Thus, the service group-level configuration attribute auto-adjust probation period has been intro-
40
92
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
duced (actually, the saAmfSGAutoAdjustProb configuration attribute in the SaAmfSG object class, shown in Section 8.9). When a service unit becomes available for auto-adjustment after a repair/recovery operation, the service unit enters its autoadjust probation period, and it cannot thereby be used for auto-adjustment while in its auto-adjust probation period. Note that the service group can be auto-adjusted using other service units, but auto-adjustment cannot use the service units in their autoadjust probation periods. Also, the service unit on probation can and should be used in other operations such as switch-over and fail-over. As soon as the auto-adjust probation period of a service unit elapses, the Availability Management Framework initiates the auto-adjust procedure for the corresponding service group. By configuring the auto-adjust probation period appropriately, the administrator can make sure that the Availability Management Framework does not run into unwanted situations such as toggling the active service units due to, for example, intermittent failures of a service unit or inadequate repair operations.
10
15
20
25
30
35
40
AIS Specification
93
Service AvailabilityTM Application Interface Specification System Description and System Model
In a service group with the 2N redundancy model, at most one service unit will have the active HA state for all service instances (usually called the active service unit), and at most one service unit will have the standby HA state for all service instances (usually called the standby service unit). Some other service units may be considered spare service units for the service group, depending on the configuration. The components in the active service unit execute the service, while the components in the standby service unit are prepared to take over the active role if the active service unit fails. Although the goal of the 2N redundancy model is to offer redundancy in service, it is possible that a 2N-redundancy service group is configured to have only one service unit. In this case, no redundancy is provided at the service units-level; however, the Availability Management Framework manages the availability of such a degenerated service group. The specification supports this single service unit 2N redundancy model, because it makes easier, from the configuration-update perspective, to add more service units later on when, for example, more nodes are configured into the cluster. Components implementing any of the capability models described in Section 3.6 on page 85 can participate in the 2N redundancy model. Examples of a service group with a 2N redundancy model are presented in Section 3.7.2.4 on page 97.
3.7.2.2 Configuration
x
10
15
20
25
Ordered list of service units for a service group: this parameter is described in Section 3.7.1.1. Default value: no default, the order is implementation-dependent. Preferred number of in-service service units at a given time: the Availability Management Framework should make sure that this number of in-service service units is always instantiated, if possible. This preferred number is configured by setting the saAmfSGNumPrefInserviceSUs attribute of the saAmfSG object class (see Section 8.9). If the ordered list of service units of a service group has at least two service units, then the preferred number of in-service service units should be at least two. If the preferred number of in-service service units is greater than two, the service group will contain some instantiated spare service units. These service units are called "spare" service units. The preferred number of in-service service units for the service groups containing only non-
30
35
40
94
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
pre-instantiable components must be set to one. Default value: two Auto-adjust option: for the general explanation of this option, refer to Section 3.7.1.1 on page 88. Section 3.7.2.3.3 on page 95 discusses how this option is handled in this redundancy model. Default value: no auto-adjust
10
When an active service unit fails over, the associated standby service unit will be assigned active for all SIs. Then, one of the spare service units will be selected and will be assigned standby for all SIs. If the number of instantiated service units falls below the preferred number of in-service service units, another service unit from the ordered list of instantiable service units will be instantiated.
3.7.2.3.2 Failure of the Standby Service Unit
15
When a standby service unit fails over, one of the spare service units will be assigned to take over the standby role, if possible. If the number of instantiated service units falls below the preferred number of in-service service units, another service unit from the set of instantiable service units will be instantiated.
3.7.2.3.3 Auto-Adjust Procedure
20
25
If the auto-adjust option is set in the configuration, the Availability Management Framework should make sure that the service group assignments are assigned back to the preferred configuration, meaning that the highest-ranked in-service service unit be active and the second highest-ranked in-service service unit be standby. It is obvious that the auto-adjust procedure may involve relocation of SIs. Though it is left to the implementation how to perform an auto-adjust, it should be done with minimum impact on the availability of the corresponding service.
3.7.2.3.4 Cluster Startup
30
Because the cluster startup is a rare event, its latency may not be as critical as other failure recovery events such as a service unit fail-over. Moreover, it is very important to start a cluster in an orderly fashion, so that the initial runtime status of the entities under the Availability Management Frameworks control is as close as possible to the preferred configuration. Saying so, during the startup of the cluster, the Availability Management Framework should wait for at most a predefined period of time to make sure that all required service units are instantiated before assigning SIs to service units. This period of time is specified in the saAmfClusterStartupTimeout con-
35
40
AIS Specification
95
Service AvailabilityTM Application Interface Specification System Description and System Model
figuration attribute of the SaAmfCluster object class, shown in Section 8.7. It is left to the implementation how to handle cluster startup; however, the implementation should make sure that the initial assignments are as close as possible to the preferred assignments.
3.7.2.3.5 Role of the List of Ordered Service Units in Assignments and Instantiations
5 The ordered list of service units will be used for the following purposes:
x
To decide when to instantiate a service unit from the list: at a given time, if the number of instantiated service units is less than the preferred number of in-service service units specified in the configuration, the non-instantiated service units with highest ranks in the service unit list will be instantiated until both numbers are equal. If the preferred number of in-service service units cannot be instantiated due to a shortage of instantiable service units, the service group will be only partially supported. To select which of the instantiated service units will have active and standby assignments: when no active or standby assignments exist for a service group, and several service units are instantiated at the same time (for example, during the cluster startup, or when multiple nodes join the cluster at the same time), then, for each SI, the Availability Management Framework will assign the service unit with highest rank in the list the active HA state for this SI and the second highest-ranked service unit the standby HA state for this SI.
10
15
20
25
30
35
40
96
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
3.7.2.4 Examples
In the following example, it is assumed that the number of preferred in-service service units is set to 2. 5
FIGURE 7 Example of the 2N Redundancy Model: Two Service Units on Different Nodes
Node U
Service Group
10
Service Unit S1
C1
Protection Group A2
C3
15
C2
C4
20
active
standby
25
CSI A1 CSI A2 Service Instance A
30
After a fault that disables Node U, Service Unit S2 on Node V will be assigned to be active for Service Instance A, as shown in FIGURE 8. 35
40
AIS Specification
97
Service AvailabilityTM Application Interface Specification System Description and System Model
1
FIGURE 8 Example of the 2N Redundancy Model. Two Service Units on Different Nodes Where a Fault Has Occurred
Node U
Service Group
C3
Protection Group A2
10
Node Failure
C4
15
active
20
CSI A1 CSI A2 Service Instance A
25
The two service units may even reside on the same node, as shown in FIGURE 9, which allows one to implement software redundancy with two instances of the application running on the same node.
30
35
40
98
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
1
FIGURE 9 Example of the 2N Redundancy Model: Two Service Units on the Same Node
Node U
Service Group
5
Service Unit S2
Protection Group A1
Service Unit S1
C1
Protection Group A2
C3
10
C2
C4
15
active
standby
20
CSI A1 CSI A2 Service Instance A
25
As shown in FIGURE 10, after a fault that disables component C1 within service unit S1, service unit S2 is assigned to be active for service instance A. Note that a fault that affects any component within a service unit and that cannot be recovered by restarting the affected component causes the entire service unit and all components within the service unit to be withdrawn from service. In this example, even though component C2 is still fully operational, it must fail-over to component C4.
30
35
40
AIS Specification
99
Service AvailabilityTM Application Interface Specification System Description and System Model
1
FIGURE 10 Example of the 2N Redundancy Model: Two Service Units on the Same Node, Where a Fault Has Occurred
Node U
Service Group
5
Service Unit S2
Protection Group A1
Service Unit S1
C1
10
C3
C1 Fails
C2
Protection Group A2
C4
15
active
20
CSI A1 CSI A2 Service Instance A
25
As shown in the next figure, the 2N service group redundancy model can support N+1 strategies at the node level. Node X supports standby service units for several service groups. If one of the other nodes fails, the corresponding service unit on Node X will be reassigned to be active for the service instance supported by the failed node. Note that Node X must support multiple service units, and might require additional resources like memory.
30
35
40
100
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
1
FIGURE 11 Example of the 2N Redundancy Model: One Node Provides Standby Service Units for Several Service Groups
Node V
10
Service Unit S3 C5 C6
C7 C8
15
20
Service Unit S6 C11 C12
25
active
active
active
30
standby CSI A1 CSI A2 Service Instance A standby CSI B1 CSI B2 Service Instance B CSI C1 CSI C2 Service Instance C standby
35
40
AIS Specification
101
Service AvailabilityTM Application Interface Specification System Description and System Model
As FIGURE 12 illustrates, the 2N redundancy model can also support strategies in which all nodes host some service units that are active for their service instances and other service units that are standby for their service instances.
FIGURE 12 Example of the 2N Redundancy Model: Three Nodes Support Some Service Units that Are Active for Their Service Instances and Other Service Units that Are Standby for Their Service Instances
Node W
10
Service Unit S2 C3 C4
15
Service Group SG2
Service Unit S3 C5 C6
PG B1 PG B2
Service Unit S4 C7 C8
20
25
Service Unit S6 C11
PG C2
C12
30
active
active
active standby standby CSI B1 CSI B2 Service Instance B CSI C1 CSI C2 Service Instance C
35
40
102
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The 2N redundancy model is represented by the UML diagram shown in the following figure. 5
FIGURE 13 UML Diagram for the 2N Redundancy Model
Service Unit 0..1 active 0..1 standby 0..* A service unit can take all active or all standby service instance assignments at a time
10
15
protects
0..*
25
30
35
40
AIS Specification
103
Service AvailabilityTM Application Interface Specification System Description and System Model
In the N+M redundancy model, the service group has N+M service units. This redundancy model has the following characteristics:
x
A service unit can be (i) active for all SIs assigned to it or (ii) standby for all SIs assigned to it. In other words, a service unit cannot be active for some SIs and standby for some other SIs at the same time. At any given time, several in-service service units can be instantiated for a service group: some service units are active for some SIs, some service units are standby for some SIs, and possibly some other service units are considered spare service units for the service group. For simplicity of the discussion, the service units having the active HA state for all SIs assigned to them are denoted as "active service units", and the service units having the standby HA state for all SIs assigned to them are denoted as "standby service units". The number of active service units, the number of standby service units, and the number of spare service units of a service group are dynamic and can change during the life-span of the service group; however, the preferred number of these service units can be configured, as discussed in Section 3.7.3.3 on page 106. For each SI and at any given time, there will be at most one active service unit and at most one standby service unit. At any given time, the Availability Management Framework should make sure that the per-SI redundancy level (one service unit assigned the active HA state and a service unit on another node assigned the standby HA state for each SI) is guaranteed, while requirements on the load constraints in each service unit and the number of available spare service units (see Section 3.7.3.3) are fulfilled. As mentioned before, the objective should be to maintain the redundancy level for all SIs (one service unit assigned the active HA state and another service unit assigned the standby HA state for each SI); however, this may not be feasible in some cases due to a shortage of available service units for the service group. For example, if the number of in-service service units is not large enough to support full redundancy levels for all SIs, then some of the SIs could be supported in a degraded mode (for instance, no service unit assigned standby for this SI). The service group deployer should be allowed to specify the order of importance of SIs, as discussed in Section 3.7.3.3.
10
15
20
25
30
35
40
104
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Components implementing any of the capability models described in Section 3.6 on page 85, except the 1_active _or_1_standby capability model, can participate in the N+M redundancy model.
3.7.3.2 Examples
A common use of the N+M redundancy model is the N+1 redundancy model in which a single service unit is assigned standby for N active service units, as shown in FIGURE 14. The following diagram depicts a typical N+1 configuration. Note that each of the components C7 and C8 of the standby service unit supports three component service instances. Node X might require additional resources like memory to accommodate additional component service instances.
FIGURE 14 Example of the N+1 Redundancy Model
10
15
Node U Service Unit S1 C1
PG A1
20
C3
PG B1
C5 C2
PG A2 PG B2
PG C1
C8 C6
PG C2
25
C4
active
active
active
30
standby CSI B1 CSI B2 Service Instance B CSI C1 CSI C2 Service Instance C standby
35
40 To illustrate what happens after a fail-over in the N+M model, assume that service unit S2 fails. As a consequence, service unit S4 should be assigned the active HA
AIS Specification
105
Service AvailabilityTM Application Interface Specification System Description and System Model
state for SI B. As S4 must not be assigned active for some SIs and standby for other SIs at the same time according to the redundancy model, the standby HA state for service instances A and C will be removed from S4. Note that this scenario also applies if the involved component capability models are x_active_and_y_standby. In a more general N+M case, the M standby service units can be freely associated with the N active service units. The following figure shows an example of the N+M redundancy model with N=3 and M=2.
10
FIGURE 15 Example of the N+M Redundancy Model, Where N = 3 and M = 2
Node Y
15
Service Unit S5
C3
PG B1
C5
PG B2
PG C1
C9 C8
20
C2
PG A2
C4 C6 active active
PG C2
C10
25
active
30
Service Instance A
Service Instance B
Service Instance C
35
3.7.3.3 Configuration
x
Ordered list of service units for a service group: this parameter is described in Section 3.7.1.1. Default value: no default, the order is implementation-dependent. Ordered list of SIs: for the general meaning of this parameter, refer to its definition in Section 3.7.1.1. The Availability Management Framework will use this
40
106
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
ranking to select some SIs to support either in non-redundant mode (that is, for each of these SIs, there is a service unit having the active HA state, but no service unit having the standby HA state) or drop them completely if the Availability Management Framework encounters a shortage of service units for the full support of all SIs; however, it is important to note that the Availability Management Framework should consider not only the ordering of the SIs but also their dependencies in choosing some SIs to support partially or drop them. The Availability Management Framework should observe the following role assignment of the SIs: the assignment goes in an order compatible with the dependencies. If several SIs could be assigned at the same time with respect to that criterion, the ordered list of SIs serves as a tie-breaker. Default value: no default, the order is implementation-dependent.
x
10
Preferred number of in-service service units: the Availability Management Framework should make sure that this number of in-service service units is always instantiated, if possible. This preferred number is configured by setting the saAmfSGNumPrefInserviceSUs attribute of the saAmfSG object class (see Section 8.9). If the service units list for a service group includes at least two service units, then the preferred number of instantiated service units should be at least two. Default value: the number of configured service units for the service group. Preferred number of active service units: this parameter indicates the preferred number of active service units at any time. This preferred number is configured by setting the saAmfSGNumPrefActiveSUs attribute of the saAmfSG object class (see Section 8.9). The Availability Management Framework should try to guarantee that this number of active service units exist for the service group if the number of in-service service units is large enough. Default value: no default value is specified. It is mandatory to set this number for each service group. Preferred number of standby service units: this indicates the preferred number of standby service units at any time. This preferred number is configured by setting the saAmfSGNumPrefStandbySUs attribute of the saAmfSG object class (see Section 8.9). The Availability Management Framework should guarantee that this number of standby service units exists for the service group if the number of in-service service units and the number of service units associated with the service group are large enough. Default value: no default value is specified. It is mandatory to set this number for each service group. Maximum number of active SIs per service unit: this indicates the maximum number of SIs that can be assigned to a service unit, so that the service unit has the active HA state for all these SIs. It is assumed that the load imposed by each SI is the same. If this assumption is not true for some service instances, the ser-
15
20
25
30
35
40
AIS Specification
107
Service AvailabilityTM Application Interface Specification System Description and System Model
vice deployer has to approximate. This maximum number is configured by setting the saAmfSGMaxActiveSIsperSU attribute of the saAmfSG object class (see Section 8.9). Default value: no limit, a value of 0 is used to specify this.
x
Maximum number of standby SIs per service unit: this indicates the maximum number of SIs that can be assigned to a service unit, so that the service unit has the standby HA state for all these SIs. It is assumed that the load imposed by each SI is the same. This maximum number is configured by setting the saAmfSGMaxStandbySIsperSU attribute of the saAmfSG object class (see Section 8.9). Default value: no limit, a value of 0 is used to specify this. Auto-adjust option: for the general explanation of this option, refer to Section 3.7.1.1 on page 88. Section 3.7.3.6 on page 118 shows an example for handling the auto-adjust option in this redundancy model. Default value: no auto-adjust
10
15
3.7.3.4 SI Assignments
In this section, the general direction in assigning SIs to in-service service units is discussed. Then, the assignment procedure will be illustrated using example configurations. If available service units for the service group allow it, the Availability Management Framework will instantiate the preferred number of in-service service units for the service group. Additionally, as many service units as the preferred number of active service units will be assigned the active HA state for SIs, and as many service units as the preferred number of standby service units will be assigned the standby HA state for SIs, according to the configuration. Additionally, some of the service units will be dedicated as spare. It is assumed that the service group configuration has passed a series of validations, so that when as many service units as the preferred number of active service units are assigned the active HA state, and as many service units as the preferred number of standby service units are assigned the standby HA state, one service unit will be assigned the active HA state, and another service unit will be assigned the standby HA state for each SI of the service group, without violating the load limits expressed in Section 3.7.3.3. In case of a shortage of in-service service units, the Availability Management Framework should use the ordered list of SIs in choosing which SIs have to be dropped or
20
25
30
35
40
108
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
supported in non-redundant mode (that is, for each of these SIs, there is a service unit having the active HA state, but no service unit having the standby HA state). In the remainder of this section, the SI assignment procedure is described. The following example of a service group configuration will be used throughout this illustration:
x x x x x x x
Ordered list of service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} Ordered list of SIs = {SI1, SI2, SI3, SI4, SI5, SI6} Preferred number of in-service service units = 7 Preferred number of active service units = 3 Preferred number of standby service units = 3 Maximum number of active SIs per service unit = 3 Maximum number of standby SIs per service unit = 4
10
15
Assignment I: Full Assignment with Spare Service Units As an initial example, it is assumed that all service units of the preceding configuration can be brought in-service. Then, the following can be a running configuration for the service group.
x x x x x
20
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7} instantiable service units = {SU8} active service units = {SU1, SU2, SU3} standby service units = {SU4, SU5, SU6} spare service units = {SU7}
25
30
SIs assigned to SU1 as active = {SI1, SI2} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as standby = {SI1, SI2} SIs assigned to SU5 as standby = {SI3, SI4} SIs assigned to SU6 as standby ={SI5, SI6} 40
35
AIS Specification
109
Service AvailabilityTM Application Interface Specification System Description and System Model
The following points should be mentioned regarding the assignments: (1) The selection of instantiated, active, and standby service units is based on the ordered list of service units. (2) The assignments of SIs to service units are based on the ordered list of SIs. (3) Service units are not fully used to their capacities. Each active service unit could handle one more SI. Similarly, each standby service unit can handle two more SIs. This extra slack will be used in case of a shortage of service units due to the unavailability of some nodes.
Note:
This specification does not define the actual algorithm for SI assignments; instead, it provides rules and examples to guide implementers. The examples provided are only illustrative and represents one possible assignment scenario (by a particular implementation) based on the configuration specified in Section 3.7.3.4. Implementers should design their own assignment algorithms by following the given rules.
10
15
The difficulty comes when the number of in-service service units is not enough to satisfy the configuration requirements. The first goal is to try to keep all SIs in the redundant mode (that is, for each of these SIs, one service unit has the active HA state, and another service unit has the standby HA state), even at the expense of imposing maximum load on each service unit. If this goal is not attainable, the next goal is to keep as many SIs as possible in a redundant mode while all SIs are assigned active in one of the service units. This procedure may lead to a reduction in the number of standby service units. Finally, if this objective is also not attainable, the only choice is to drop some of the SIs completely. This means reducing further the number of active service units.
20
25
The following subsections sketch the procedure for assigning service units and SIs in situations of shortage of in-service service units.
3.7.3.4.1 Reduction Procedure
30
The following procedure is for assigning SIs to in-service service units and for supporting the N+M service group if not enough service units are available. If the number of in-service service units is not large enough to support the preferred number of active, standby, and spare service units, as defined in the configuration, the following procedure is used to maintain an acceptable level of support for the service group.
35
40
110
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Step 1: Reduction of the Number of Spare Service Units If the number of instantiated service units does not allow enough spare service units, the service group should be maintained with less spare service units than the required number. The number of the spare service units is reduced until: (1.a) The Availability Management Framework succeeds in allocating the preferred number of active and standby service units. In this case, the assignment procedure is completed.
10 OR (1.b) After dropping all spare service units, the Availability Management Framework does not succeed in allocating the preferred number of active and standby service units. In this case, the assignment procedure continues to the next step ((2.a) or (2.b)). The following example illustrates case (1.a). Assignment II: Full Assignment with Spare Reduction Assume that the state of the cluster is as follows:
x x x x x
15
20
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6} instantiable service units = {} active service units = {SU1, SU2, SU3} standby service units = {SU4, SU5, SU6} spare service units = {}
25
30 Based on the preceding configuration, SI assignments fulfilling the condition that every SI is in redundant mode can be:
x x x x x x
SIs assigned to SU1 as active = {SI1, SI2} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as standby = {SI1, SI2} SIs assigned to SU5 as standby = {SI3, SI4} SIs assigned to SU6 as standby = {SI5, SI6}
35
40
AIS Specification
111
Service AvailabilityTM Application Interface Specification System Description and System Model
Step 2: Reduction of the Number of Standby Service Units If the preferred number of active and standby service units cannot be supported due to a shortage of in-service service units, the Availability Management Framework is forced to use fewer standby service units than the preferred number expressed in the configuration. As the number of standby service units gets smaller, the number of SIs assigned to each standby service units increases. The Availability Management Framework needs to guarantee that the load does not exceed the service units capacity expressed in the configuration. The number of standby service units is reduced until: (2.a) The preferred number of active service units is reached, and, for each SI, a service unit has been assigned the standby HA state without violating the capacity levels of the service units. In this case, the assignment procedure is completed. OR (2.b) All standby service units have been loaded to their maximum capacity, but some SIs are still without standby assignments. In this case, the assignment procedure continues to the next step ((3.a) or (3.b)). The following example illustrates case (2.a). Assignment III: Full Assignment With Reduction of Standby Service Units Assume that the state of the cluster is such that the only service units that can be brought in-service are SU1, SU2, SU3, SU4, and SU5. These instantiated service units take the following responsibilities:
x x x x
10
15
20
25
30
in-service service units = {SU1, SU2, SU3, SU4, SU5} active service units = {SU1, SU2, SU3} standby service units = {SU4, SU5} spare service units = {}
35
40
112
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Based on the preceding configuration, SI assignments fulfilling the condition that every SI is in redundant mode can be:
x x x x x
SIs assigned to SU1 as active = {SI1, SI2} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as standby = {SI1, SI2, SI3} SIs assigned to SU5 as standby = {SI4, SI5, SI6} 10 5
Step 3: Reduction of the Number of Active Service Units If even after loading standby service units to their full capacity, the number of in-service service units is still not enough to maintain the preferred number of active service units, the Availability Management Framework tries to reduce the number of active service units by loading active service units to their full capacity. In this step, the number of active service units should be reduced until: (3.a) For each SI, there is an active assignment without violating the capacity levels of active service units. In this case, the assignment procedure is completed. OR (3.b) All active service units have been loaded to their maximum capacity, but some SIs are still without active or standby assignments. In this case, the assignment procedure should continue to the next step ((4.a) or (4.b)). The following example illustrates case (3.a). Assignment IV: Full Assignment with Reduction of Active Service Units Assume that the state of the cluster is such that the only service units that can be brought in-service are SU1, SU2, SU3, and SU4. These instantiated service units take the following responsibilities:
x x x x
15
20
25
30
in-service service units = {SU1, SU2, SU3, SU4} active service units = {SU1, SU2} standby service units = {SU3, SU4} spare service units = {}
35
40
AIS Specification
113
Service AvailabilityTM Application Interface Specification System Description and System Model
SIs assigned to SU1 as active = {SI1, SI2, SI3} SIs assigned to SU2 as active = {SI4, SI5, SI6} SIs assigned to SU3 as standby = {SI1, SI2, SI3} SIs assigned to SU4 as standby = {SI4, SI5, SI6} 5
Note that in the preceding assignments, all SIs are still supported in redundant mode. Step 4: Reduction of the Standby Assignments for some SIs At this step of the assignment procedure, the number of instantiated service units is not enough to guarantee redundant assignments for all SIs; therefore, the Availability Management Framework is forced to drop the standby assignment of some SIs. The Availability Management Framework will use the ordered SI list to decide for which SIs standby assignments should be dropped. The standby assignments for some SIs will be dropped until: (4.a) For each SI, there is a service unit with the active HA state for this SI. In this case, the assignment procedure is completed. OR (4.b) The number of the in-service service units is so small that the Availability Management Framework cannot assign the active HA state to these service unit for all SIs. In this case, the reduction procedure continues to the next step (5). The following example illustrates case (4.a). Assignment V: Partial Assignment with Reduction of Standby Assignments Assume that the state of the cluster is such that only the service units SU1, SU2, and SU3 can be brought in-service. 35 The instantiated service units take the following responsibilities:
x x x x
10
15
20
25
30
in-service service units = {SU1, SU2, SU3} active service units = {SU1, SU2} standby service units = {SU3} spare service units = {}
40
114
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
SIs assigned to SU1 as active = {SI1, SI2, SI3} SIs assigned to SU2 as active = {SI4, SI5, SI6} SIs assigned to SU3 as standby = {SI1, SI2, SI3, SI4}
Note that in this assignment SI5 and SI6 are supported only in non-redundant mode (that is, for each of these SIs, there is a service unit having the active HA state, but no service unit having the standby HA state). Step 5: Reduction of the Active Assignments for some SIs At this stage of the reduction procedure, the number of instantiated service units is so small that the Availability Management Framework cannot guarantee that service units have been assigned active for all SIs. Therefore, some of the SIs should be dropped. As stated earlier, the ordered list of SIs should be used to decide which SIs should be dropped. This last step continues until a subset of the SIs are supported in non-redundant mode (that is, for each of these SIs, there is a service unit having the active HA state, but no service unit having the standby HA state). The following example illustrate the last step of the reduction procedure. Assignment VI: Partial Assignment with SIs Drop-Outs Assume that the state of the cluster is such that SU1 is the only service unit that can be brought in-service. The instantiated service units take the following responsibilities:
x x x x
10
15
20
25
30
in-service service units = {SU1} active service units = {SU1} standby service units = {} spare service units = {}
35
Note that in the preceding example, SI4, SI5, and SI6 are completely dropped.
AIS Specification
115
Service AvailabilityTM Application Interface Specification System Description and System Model
The Availability Management Framework reactions to failures such as node failures are implementation-dependent and out of the scope of the specification; however, the Availability Management Framework should handle failures in a way that the availability of all SIs supported by service groups are guaranteed, if possible. The following examples should be considered as illustrations of high-level requirements on the Availability Management Framework failure handling and should not be seen as the only way of failure handling.
3.7.3.5.1 Handling of a Node Failure when Spare Service Units Exist
10
Assume the following cluster configuration before the node hosting SU1 failed:
x x x x x x x x x x x
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7} instantiable service units = {} active service units = {SU1, SU2, SU3} standby service units = {SU4, SU5, SU6} spare service units = {SU7} SIs assigned to SU1 as active = {SI1, SI2} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as standby = {SI1, SI2} SIs assigned to SU5 as standby = {SI3, SI4} SIs assigned to SU6 as standby = {SI5, SI6}
15
20
25
When the node hosting SU1 fails, SI1 and SI2 lose their active assignments; therefore, the Availability Management Framework must react in attempting to restore the active assignments for SI1 and SI2. This attempt is the immediate reaction of the Availability Management Framework to the failure. Additionally, the Availability Management Framework should use the spare service unit to restore the standby assignment for SI1 and SI2 as well. After the recovery, the assignment should look like the following:
30
35
x x x
in-service service units = {SU2, SU3, SU4, SU5, SU6, SU7} instantiable service units = {} active service units = {SU2, SU3, SU4}
40
116
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
x x x x x x x x
standby service units = {SU5, SU6, SU7} spare service units = {} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as active = {SI1, SI2} SIs assigned to SU5 as standby = {SI3, SI4} SIs assigned to SU6 as standby = {SI5, SI6} SIs assigned to SU7 as standby = {SI1, SI2}
10
The following example illustrates how the Availability Management Framework uses the available capacity of service units to retain the redundant mode of SIs when a node hosting some service units fails. Assume the following cluster configuration before the failure of the node hosting SU2:
x x x x x x x x x x x
15
in-service service units = {SU2, SU3, SU4, SU5, SU6, SU7} instantiable service units = {} active service units = {SU2, SU3, SU4} standby service units = {SU5, SU6, SU7} spare service units = {} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as active = {SI1, SI2} SIs assigned to SU5 as standby = {SI3, SI4} SIs assigned to SU6 as standby = {SI5, SI6} SIs assigned to SU7 as standby = {SI1, SI2}
20
25
30
When the node hosting SU2 fails, SI3 and SI4 lose their active assignments; therefore, the immediate action for the Availability Management Framework is to restore the active assignments of SI3 and SI4. Additionally, the standby assignments of these SIs should also be restored. A couple of different ways exist for restoring the standby assignments for SI3 and SI4. It depends on the Availability Management Framework implementation how to achieve this without violating the configuration parameters (such as the number of active/standby SIs assigned to a service unit).
35
40
AIS Specification
117
Service AvailabilityTM Application Interface Specification System Description and System Model
One way of restoring the standby assignments for SI3 and SI4 is the following one.
x x x x x x x x x x
in-service service units = {SU3, SU4, SU5, SU6, SU7} instantiable service units = {} active service units = {SU3, SU4, SU5} standby service units = {SU6, SU7} spare service units = {} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as active = {SI1, SI2} SIs assigned to SU5 as active = {SI3, SI4} SIs assigned to SU6 as standby = {SI5, SI6, SI4} SIs assigned to SU7 as standby = {SI1, SI2, SI3}
10
15
The auto-adjust option requires that the current (running) configuration of the service group returns to the preferred configuration, so that the service units with the highest ranks are active and the highest-ranked SIs are assigned in redundant mode (that is, there is a service unit having the active HA state for each of these SIs and another service unit having the standby HA state for each of these SIs). It is up to the Availability Management Framework implementation to decide when and how the autoadjust will be initiated. The following example is given for illustration purposes. Assume that the following is the configuration of the service group.
20
25
x x x x x x x x x x x
in-service service units = {SU2, SU3, SU4, SU5, SU6, SU7} instantiable service units = {} active service units = {SU2, SU3, SU4} standby service units = {SU5, SU6, SU7} spare service units = {} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as active = {SI1, SI2} SIs assigned to SU5 as standby = {SI3, SI4} SIs assigned to SU6 as standby = {SI5, SI6} SIs assigned to SU7 as standby = {SI1, SI2}
30
35
40
118
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Now, assume that the node hosting SU1 joins the cluster. As a result, SU1 becomes instantiable. Because SU1 has the highest rank in the ordered list of service units, the preceding configuration is no longer a preferred one. The auto-adjust is initiated in a implementation-dependent way. After the completion of the auto-adjust procedure (assuming that SU1 could be brought in-service) the service group configuration should look like as follows:
x x x x x
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7} instantiable service units = {} active service units = {SU1, SU2, SU3} standby service units = {SU4, SU5, SU6} spare service units = {SU7}
10
15
SIs assigned to SU1 as active = {SI1, SI2} SIs assigned to SU2 as active = {SI3, SI4} SIs assigned to SU3 as active = {SI5, SI6} SIs assigned to SU4 as standby = {SI1, SI2} SIs assigned to SU5 as standby = {SI3, SI4} SIs assigned to SU6 as standby = {SI5, SI6}
20
25
Note that the Availability Management Framework may undergo a series of SI relocations to go from the configuration before the auto-adjust to the preceding configuration. 30
35
40
AIS Specification
119
Service AvailabilityTM Application Interface Specification System Description and System Model
The N+M redundancy model is represented by the UML diagram shown in the following figure. 5
FIGURE 16 UML Diagram of the N+M Redundancy Model
N service units can have only active and M service units can have only standby service instance assignments at a time
10
15
20
25
30
35
40
120
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
In the N-way redundancy model, a service group contains N service units that protect multiple service instances. This redundancy model has the following characteristics:
x
In a service group with the N-way redundancy model, a service unit can simultaneously be assigned (i) the active HA state for some SIs and (ii) the standby HA state for some other SIs. At most one service unit may have the active HA state for an SI, and none, one, or multiple service units may have the standby HA state for the same SI. The preferred number of standby assignments for an SI is an SI-level configuration parameter. The preferred number of standby assignments may differ for each SI. At any given time, several service units can be in-service for a service group: some have SI assignments and possibly some others are considered spare service units for the service group. The number of assigned service units and the number of spare service units are dynamic and can change during the life-span of the service group; however, the preferred number of these service units can be configured, as will be discussed in Section 3.7.4.3. At any given time, and if resources allow, the Availability Management Framework should make sure that the redundancy level is guaranteed for each SI (one service unit assigned active and as many service units as the preferred number of standby assignments assigned standby) while the load constraints in each service unit and the number of spare service units are fulfilled. Each SI has an ordered list of service units to which the SI can be assigned. Since any service unit in a service group is capable of providing all SIs defined in the configuration for the service group (see Section 3.2.6), the ordered list of service units per SI includes all the service units configured for the service group. In other words, a partial list of service units is an invalid configuration. If the number of in-service service units allows it, the Availability Management Framework should make sure that the highest-ranked in-service service units be assigned active for each service instance, and, according to the preferred number of standby assignments, the higher-ranked amongst in-service service units be assigned standby for that service instance.
10
15
20
25
30
35
40
AIS Specification
121
Service AvailabilityTM Application Interface Specification System Description and System Model
Only components implementing the x_active_and_y_standby component capability model can participate in the N-way redundancy model.
3.7.4.2 Example
FIGURE 17 next shows an example of the N-way redundancy model. Note that each component has the active HA state for one component service instance and the standby HA state for the other two component service instances.
FIGURE 17 Example of the N-Way Redundancy Model
10
Service Unit S2
15
C3
PG Z1 PG X2 PG Z1 PG Y2 PG X2
C5
20
C2
PG Y2 PG Z2
C4
PG Z2
C6
standby active
standby
25
standby active
30
3.7.4.3 Configuration
x
35
Ordered list of service units for a service group: this parameter is described in Section 3.7.1.1. Default value: no default, the order is implementation-dependent. Ordered list of SIs: for the general meaning of this parameter, refer to its definition in Section 3.7.1.1. The Availability Management Framework will use this ranking to choose SIs to support either in non-redundant mode (that is, there is a
40
122
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
service unit having the active HA state for each of these SIs, but no service unit having the standby HA state for each of these SIs) or drop them completely if the set of instantiated service units does not allow full support of all SIs. Default value: no default, the order is implementation dependent.
x
Ranked service unit list per SI: each SI has an ordered list of service units to which the SI can be assigned. The rank of a service unit for an SI is configured by setting the saAmfRank attribute of a service unit identified by safRankedSu in the SaAmfSIRankedSU association class (see Section 8.11). The rank is represented by a positive integer. The lower the integer value, the higher the rank. The Availability Management Framework should make sure that the highestranked available service unit be assigned active for the SI, and the remaining available high-ranked service units be assigned standby for the SI, if possible; that is, the second highest-ranked service unit is assigned the first ranked standby, the third highest-ranked service unit is assigned the second ranked standby, and so on. Default value: the ordered service units list defined for the service group. Preferred number of standby assignments per SI: this parameter indicates the preferred number of service units that are assigned the standby HA state for this SI. This preferred number is configured by setting the saAmfSIPrefStandbyAssignments attribute of the saAmfSI object class (see Section 8.11). Default value: 1 Preferred number of in-service service units: the Availability Management Framework should make sure that this number of in-service service units is always instantiated, if possible. This preferred number is configured by setting the saAmfSGNumPrefInserviceSUs attribute of the saAmfSG object class (see Section 8.9). If the service units list for a service group includes at least two service units, the preferred number of in-service service units should be at least two. Default value: the number of the service units configured for the service group. Preferred number of assigned service units: this parameter indicates the preferred number of assigned service units at any time. This preferred number is configured by setting the saAmfSGNumPrefAssignedSUs attribute of the saAmfSG object class (see Section 8.9). As to be discussed in Section 3.7.4.4 on page 124, the Availability Management Framework should try to guarantee that this number of assigned service units exists for the service group if the number of instantiated service units is large enough. Default value: the preferred number of in-service service units. Maximum number of active SIs per service unit: this parameter indicates the maximum number of SIs that can be concurrently assigned to a service unit, so that the service unit has the active HA state for all these SIs. It is assumed that
10
15
20
25
30
35
40
AIS Specification
123
Service AvailabilityTM Application Interface Specification System Description and System Model
the load imposed by each SI is the same. This maximum number is configured by setting the saAmfSGMaxActiveSIsperSU attribute of the saAmfSG object class (see Section 8.9). Default value: no limit, a value of 0 is used to specify this.
x
Maximum number of standby SIs per service unit: this parameter indicates the maximum number of standby SIs that can be concurrently assigned to a service unit, so that the service unit has the standby HA state for all these SIs. It is assumed that the load imposed by each SI is the same. This maximum number is configured by setting the saAmfSGMaxStandbySIsperSU attribute of the saAmfSG object class (see Section 8.9) Default value: no limit, a value of 0 is used to specify this. Auto-adjust option: for the general explanation of this option, refer to Section 3.7.1.1 on page 88. Section 3.7.4.6 on page 129 shows an example for handling the auto-adjust option in this redundancy model. Default value: no auto-adjust
10
15
3.7.4.4 SI Assignments
In this section, the general direction in assigning SIs to service units is discussed. Then, a few examples will be given for illustration. If available service units in the cluster allow it, the Availability Management Framework will instantiate the preferred number of in-service service units for the service group. Moreover, the preferred number of assigned service units will be used for SI assignments. The remaining inservice service units, if any, will be spare. It is assumed that the service group configuration has passed a series of validations, so that when as many service units as the preferred number of assigned service units have been assigned, for each configured SI in the service group, a service unit is assigned active for this SI, and the preferred number of standby assignments is ensured without violating the limits expressed in Section 3.7.4.3.
20
25
30
The following example of a service group configuration will be used throughout this section:
x x x x x
35
Ordered list of service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} Ordered list of SIs = {SI1, SI2, SI3, SI4, SI5, SI6} Ranked service units for SI1 = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} Preferred number of standby assignments for SI1 = 5 Ranked service units for SI2 = {SU2, SU3, SU4, SU5, SU6, SU7, SU8, SU1} 40
124
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
x x x x x x x x x x x x x
Preferred number of standby assignments for SI2 = 5 Ranked service units for SI3 = {SU3, SU4, SU5, SU6, SU7, SU8, SU1, SU2} Preferred number of standby assignments for SI3 = 5 Ranked service units for SI4 = {SU4, SU5, SU6, SU7, SU8, SU1, SU2, SU3} Preferred number of standby assignments for SI4 = 5 Ranked service units for SI5 = {SU5, SU6, SU7, SU8, SU1, SU2, SU3, SU4} Preferred number of standby assignments for SI5 = 5 Ranked service units for SI6 = {SU6, SU7, SU8, SU1, SU2, SU3, SU4, SU5} Preferred number of standby assignments for SI6 = 5 Preferred number of in-service service units = 8 Preferred number of assigned service units = 7 Maximum number of active SIs per service unit = 3 Maximum number of standby SIs per service unit = 5
10
15
Assignment I: Full Assignment with Spare Service Units Assume that under the current state of the cluster, all service units can be brought inservice. Then, a running configuration for the service group can be as follows:
x x x x
20
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7} spare service units = {SU8} 25
SI1's assignments = {active: SU1; standby: SU2, SU3, SU4, SU5, SU6} SI2's assignments = {active: SU2; standby: SU3, SU4, SU5, SU6, SU7} SI3's assignments = {active: SU3; standby: SU4, SU5, SU6, SU7, SU1} SI4's assignments = {active: SU4; standby: SU5, SU6, SU7, SU1, SU2} SI5's assignments = {active: SU5; standby: SU6, SU7, SU1, SU2, SU3} SI6's assignments = {active: SU6; standby: SU7, SU1, SU2, SU3, SU4}
30
35
40
AIS Specification
125
Service AvailabilityTM Application Interface Specification System Description and System Model
The following points should be mentioned regarding the preceding assignments: (1) The selection of instantiated service units is based on the ordered list of service units. (2) The assignments of SIs to service units is based on the ordered list of service units for each SI.
3.7.4.4.1 Reduction Procedure
The difficulty comes when the number of in-service service units is not enough to satisfy the configuration requirements listed in the example. The first goal is to try to keep all SIs in the wanted redundant mode (that is, one service unit is assigned active for each of these SIs, and the preferred number of standby assignments is ensured), even at the expense of imposing maximum load on each service unit. If this goal is not attainable, the next goal is to make sure that as many SIs as possible have active assignments. This may mean reduction in the number of standby service units. The reduction is done for less important SIs first. Finally, if this objective is also not attainable, the only choice is to drop some of the SIs completely. Because the reduction algorithm is simple and somehow similar to the reduction procedure discussed in the N+M case, the reduction procedure is not discussed, and only examples are given. Assignment II: Full Assignment with Spare Reduction Assume that initially the service units that can be brought in-service are SU1, SU2, SU3, SU4, SU5, SU6, and SU7. Then, the following can be a running configuration for the service group.
x x x x
10
15
20
25
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7} spare service units = {}
30
SI1's assignments = {active: SU1; standby: SU2, SU3, SU4, SU5, SU6} SI2's assignments = {active: SU2; standby: SU3, SU4, SU5, SU6, SU7} SI3's assignments = {active: SU3; standby: SU4, SU5, SU6, SU7, SU1} SI4's assignments = {active: SU4; standby: SU5, SU6, SU7, SU1, SU2} SI5's assignments = {active: SU5; standby: SU6, SU7, SU1, SU2, SU3} SI6's assignments = {active: SU6; standby: SU7, SU1, SU2, SU3, SU4}
35
40
126
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Assignment III: Full Assignment with Reduction of Assigned Service Units Assume that the state of the cluster is initially such that only SU1, SU2, SU3, SU4, SU5, SU6 can be brought in-service. Then, the state of the service units is:
x x x x
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6} spare service units = {}
SI1's assignments = {active: SU1; standby: SU2, SU3, SU4, SU5, SU6} SI2's assignments = {active: SU2; standby: SU3, SU4, SU5, SU6, SU1} SI3's assignments = {active: SU3; standby: SU4, SU5, SU6, SU1, SU2} SI4's assignments = {active: SU4; standby: SU5, SU6, SU1, SU2, SU3} SI5's assignments = {active: SU5; standby: SU6, SU1, SU2, SU3, SU4} SI6's assignments = {active: SU6; standby: SU1, SU2, SU3, SU4, SU5} 20 15
Assignment IV: Partial Assignment with Reduction of SIs Redundancy Level Assume that the state of the cluster is such that only the following service units can be brought in-service:
x x x x
in-service service units = {SU1, SU2, SU3} instantiable service units = {} assigned service units = {SU1, SU2, SU3} spare service units = {}
25
30
SI1's assignments = {active: SU1; standby: SU2, SU3} SI2's assignments = {active: SU2; standby: SU3, SU1} SI3's assignments = {active: SU3; standby: SU1, SU2} SI4's assignments = {active: SU1; standby: SU2, SU3} SI5's assignments = {active: SU1; standby: SU2, SU3} SI6's assignments = {active: SU2; standby: SU3, SU1}
35
40
AIS Specification
127
Service AvailabilityTM Application Interface Specification System Description and System Model
Assignment V: Partial Assignment with SIs Drop-Outs Assume that the state of the cluster is such that only SU1 can be brought in-service. Then, the cluster status looks like:
x x x x
in-service service units = {SU1} instantiable service units = {} assigned service units = {SU1} spare service units = {}
10
x x x x x x
SI1's assignments = {active: SU1; standby: none} SI2's assignments = {active: SU1; standby: none} SI3's assignments = {active: SU1; standby: none} SI4's assignments = {active: none; standby: none} SI5's assignments = {active: none; standby: none} SI6's assignments = {active: none; standby: none} 20 15
In this section, the fail-over action initiated by a node failure is described. Assume that the node hosting SU3 fails. The assignments before the node hosting SU3 failed and after the fail-over completion are as follows: Assignments Before the Node Hosting SU3 Fails
x x x x
25
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6} spare service units = {} 30
x x x x x x
SI1's assignments = {active: SU1; standby: SU2, SU3, SU4, SU5, SU6} SI2's assignments = {active: SU2; standby: SU3, SU4, SU5, SU6, SU1} SI3's assignments = {active: SU3; standby: SU4, SU5, SU6, SU1, SU2} SI4's assignments = {active: SU4; standby: SU5, SU6, SU1, SU2, SU3} SI5's assignments = {active: SU5; standby: SU6, SU1, SU2, SU3, SU4} SI6's assignments = {active: SU6; standby: SU1, SU2, SU3, SU4, SU5}
35
40
128
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Assignments After Completion of the Fail-Over x in-service service units = {SU1, SU2, SU4, SU5, SU6}
x x x
instantiable service units = {} assigned service units = {SU1, SU2, SU4, SU5, SU6} spare service units = {}
x x x x x x
SI1's assignments = {active: SU1; standby: SU2, SU4, SU5, SU6} SI2's assignments = {active: SU2; standby: SU4, SU5, SU6, SU1} SI3's assignments = {active: SU4; standby: SU5, SU6, SU1, SU2} SI4's assignments = {active: SU4; standby: SU5, SU6, SU1, SU2} SI5's assignments = {active: SU5; standby: SU6, SU1, SU2, SU4} SI6's assignments = {active: SU6; standby: SU1, SU2, SU4, SU5}
10
15
When the node hosting SU3 fails, the Availability Management Framework makes adjustments by removing assignments of the SIs from SU3. In this example, it is assumed that the ordering of standby assignments is important. This means that the Availability Management Framework has to inform the components of some service units of the change in their active/standby HA states. For instance, in this example, the Availability Management Framework should do the following for SI1:
x
20
Ask the components of SU4 to go to standby-level 2 for SI1 (it was standby-level 3 before). Ask the components of SU5 to go to standby-level 3 for SI1 (it was standby-level 4 before). Ask the components of SU6 to go to standby-level 4 for SI1 (it was standby-level 5 before).
25
30
The auto-adjust option indicates that it is required that the current (running) configuration of the service group returns to the preferred configuration in which the service instance with highest ranks are active and the highest-ranked SIs are assigned in redundant mode. It is up to the Availability Management Framework implementation to decide when and how the auto-adjust will be initiated. The following example is given for illustration purposes.
35
40
AIS Specification
129
Service AvailabilityTM Application Interface Specification System Description and System Model
in-service service units = {SU1, SU2, SU3} instantiable service units = {} assigned service units = {SU1, SU2, SU3} spare service units = {} 5
x x x x x x
SI1's assignments = {active: SU1; standby: SU2, SU3} SI2's assignments = {active: SU2; standby: SU3, SU1} SI3's assignments = {active: SU3; standby: SU1, SU2} SI4's assignments = {active: SU1; standby: SU2, SU3} SI5's assignments = {active: SU1; standby: SU2, SU3} SI6's assignments = {active: SU2; standby: SU3, SU1)
10
15
Now, assume that the node hosting SU4 joins the cluster. As result, SU4 becomes instantiable. It is obvious that this configuration is not the preferred one. If the autoadjust is initiated (in an implementation-dependent way), and assuming that SU4 could be brought in-service, then the service group configuration is as follows after completion of the auto-adjust procedure:
x x x x
20
in-service service units = {SU1, SU2, SU3, SU4} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4} spare service units = {}
25
30
x x x x x x
SI1's assignments = {active: SU1; standby: SU2, SU3, SU4} SI2's assignments = {active: SU2; standby: SU3, SU4, SU1} SI3's assignments = {active: SU3; standby: SU4, SU1, SU2} SI4's assignments = {active: SU4; standby: SU1, SU2, SU3} SI5's assignments = {active: SU1; standby: SU2, SU3, SU4} SI6's assignments = {active: SU2; standby: SU3, SU4, SU1} 40
35
130
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The N-way redundancy model is represented by the UML diagram shown in the following figure. 5
FIGURE 18 UML Diagram of the N-Way Redundancy Model
A service unit can take several active and several standby service instance assignments at a time
10
25
30
35
40
AIS Specification
131
Service AvailabilityTM Application Interface Specification System Description and System Model
In the N-way active redundancy model, the service group contains N service units. The characteristics of this redundancy model are:
x x x
Each service unit has to be active for all the SIs assigned to it. A service unit is never assigned the standby state for any SI. For each SI, none, one, or multiple service units can be assigned the active HA state for that SI. The preferred number of active assignments for an SI is an SI-level configuration parameter (see Section 3.7.5.3 on page 133). The preferred number of active assignments may be different for each SI. At any given time, several service units can be in-service for a service group: some have SIs assigned to them, and possibly some others are considered spare service units for the service group. The number of assigned service units and the number of spare service units are dynamic and can change during the life-span of the service group; however, the preferred number of these service units can be configured. At any given time, the Availability Management Framework should make sure that the redundancy level (the preferred number of active assignments) for each SI is guaranteed (if possible) while the maximum number of SIs assigned to each service units is not exceeded. Each SI has an ordered list of service units to which the SI can be assigned. The ordered list of service units per SI must include all the service units configured for the service group. In other words, a partial list of service units is an invalid configuration. If the number of instantiated service units allows it, the Availability Management Framework should make sure that the highest-ranked available service units are assigned active for the SI.
10
15
20
25
30
The simplest case for the N-way active redundancy model is the 2-way active redundancy model in which the service group contains two service units that are both assigned the active HA state for every service instance that they support. This configuration is sometimes referred to as an active-active redundancy configuration. Components implementing any of the capability models described in Section 3.6 on page 85 can participate in the N-way active redundancy model.
35
40
132
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
3.7.5.2 Example
FIGURE 19 next shows an example of the N-way active redundancy model. Note that the HA state of each component for all component service instances assigned to it is active.
FIGURE 19 Example of the N-Way Active Redundancy Model
10
C3
15
C5
C2
PG B2 PG C2
20
active
active active
25
CSI A1 CSI A2 Service Instance A CSI B1 CSI B2 Service Instance B CSI C1 CSI C2 Service Instance C
30
3.7.5.3 Configuration
x
Ordered list of service units for a service group: this parameter is described in Section 3.7.1.1. Default value: no default, the order is implementation-dependent. Ordered list of SIs: for the general meaning of this parameter, refer to its definition in Section 3.7.1.1. The Availability Management Framework will use this ranking to choose the SIs with less redundancy (that is, the number of service units having the active HA state for them is less than the preferred number of active service units) or drop them completely if the number of available service units is not enough for a full support of all SIs. Default value: no default, the order is implementation-dependent.
35
40
AIS Specification
133
Service AvailabilityTM Application Interface Specification System Description and System Model
Ranked service unit list per SI: each SI has an ordered list of service units to which the SI can be assigned. This list must be an ordered list consisting of all service units configured for the service group. The rank of a service unit for an SI is configured by setting the saAmfRank attribute of a service unit identified by safRankedSu in the SaAmfSIRankedSU association class (see Section 8.11). The rank is represented by a positive integer. The lower the integer value, the higher the rank. The Availability Management Framework should make sure that the highestranked available service unit be assigned active for the SI, if possible. Default value: the ordered service units list defined for the service group. Preferred number of active assignments per SI: this parameter indicates the preferred number of service units being assigned the active HA state for each SI. This preferred number is configured by setting the saAmfSIPrefActiveAssignments attribute of the saAmfSI object class (see Section 8.11). Default value: the preferred number of assigned service units. Preferred number of in-service service units: the Availability Management Framework should make sure that this number of service units are always instantiated, if possible. This preferred number is configured by setting the saAmfSGNumPrefInserviceSUs attribute of the saAmfSG object class (see Section 8.9). Default value: the number of the service units configured for the service group. Preferred number of assigned service units: this parameter indicates the preferred number of assigned service units at any time. This preferred number is configured by setting the saAmfSGNumPrefAssignedSUs attribute of the saAmfSG object class (see Section 8.9). As to be discussed later, the Availability Management Framework should try to guarantee that this number of assigned service units exists for the service group if the number of instantiated service units is large enough. Default value: the preferred number of in-service service units. Maximum number of active SIs per service unit: this parameter indicates the maximum number of SIs that can be concurrently assigned to a service unit, so that the service unit has the active HA state for all these SIs. It is assumed that the load imposed by each SI is the same. This maximum number is configured by setting the saAmfSGMaxActiveSIsperSU attribute of the saAmfSG object class (see Section 8.9). Default value: no limit, a value of 0 is used to specify this.
10
15
20
25
30
35
40
134
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Auto-adjust option: for the general explanation of this option, refer to Section 3.7.1.1 on page 88. Section 3.7.5.6 on page 143 shows an example for handling the auto-adjust option in this redundancy model. Default value: no auto-adjust
3.7.5.4 SI Assignments
First, the general direction in assigning SIs to service units is discussed. Then, a few examples will be given for illustration. If the number of available service units in the cluster allows it, the Availability Management Framework will instantiate the preferred number of in-service service units for the service group. Additionally, the preferred number of in-service service units will be assigned the active HA state for each SI. The remaining instantiated service units will be spare, if the configuration allows. It is assumed that the service group configuration has passed a series of validations, so that when as many as the preferred number of assigned service units have been assigned, all SIs configured for the service group are assignable, so that each SI will have the preferred number of active assignments without violating the limits expressed in the configuration section.
10
15
20 The following example of a service group configuration will be used throughout this section.
x
x x
Ordered list of service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8, SU9} Ordered list of SIs = {SI1, SI2, SI3, SI4, SI5, SI6} Ranked service units for SI1 = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8, SU9} Preferred number of active assignments for SI1 = 6 Ranked service units for SI2 = {SU2, SU3, SU4, SU5, SU6, SU7, SU8, SU9, SU1} Preferred number of active assignments for SI2 = 6 Ranked service units for SI3 = {SU3, SU4, SU5, SU6, SU7, SU8, SU9, SU1, SU2} Preferred number of active assignments for SI3 = 6 Ranked service units for SI4 = {SU4, SU5, SU6, SU7, SU8, SU9, SU1, SU2, SU3} Preferred number of active assignments for SI4 = 6
25
30
x x
x x
35
x x
40
AIS Specification
135
Service AvailabilityTM Application Interface Specification System Description and System Model
x x
Ranked service units for SI5 = {SU5, SU6, SU7, SU8, SU9, SU1, SU2, SU3, SU4} Preferred number of active assignments for SI5 = 6 Ranked service units for SI6 = {SU6, SU7, SU8, SU9, SU1, SU2, SU3, SU4, SU5} Preferred number of active assignments for SI6 = 6 Preferred number of in-service service units = 9 Preferred number of assigned service units = 8 Maximum number of active SIs per service unit = 5
x x x x
10
Assignment I: Full Assignment with Spare Assume that under the current state of the cluster, all service units can be brought inservice. Then, the following can be a running configuration for the service group. 15
x x x x
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8, SU9} instantiable service units ={} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} spare service units = {SU9}
20
SI1's assignments = {SU1, SU2, SU3, SU4, SU5, SU6} SI2's assignments = {SU2, SU3, SU4, SU5, SU6, SU7} SI3's assignments = {SU3, SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU3, SU4} 30
35
The following points should be mentioned regarding the preceding assignments: (1) The selection of in-service service units is based on the ordered list of service units. (2) The assignments of SIs to service units are based on the ordered list of service units for each SI.
40
136
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The difficulty comes when the number of in-service service units is not enough to satisfy the requirements listed in the configuration. The first goal is to try to keep all SIs in the preferred redundancy levels (that is, with the preferred number of active assignments), even at the expense of imposing maximum load on each service unit. If this goal is not attainable, the next goal is to keep as many important SIs as possible in the preferred redundancy levels without dropping any SIs completely. This may mean reducing the number of assignments for some SIs. The reduction is done for less important SIs first. Finally, if this objective is also not attainable, the only choice is to drop some of the SIs completely (starting first with least important service units). Because the reduction algorithm is simple and somehow similar to the reduction procedures discussed in the N+M and N-way cases, the reduction procedure is not discussed, and only examples are given. Assignment II: Full Assignment with Spare Reduction Assume that under the current state of the cluster, SU9 cannot be instantiated. Then, the following can be a running configuration for the service group.
x x x x
10
15
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} spare service units = {}
20
25
SI1's assignments = {SU1, SU2, SU3, SU4, SU5, SU6} SI2's assignments = {SU2, SU3, SU4, SU5, SU6, SU7} SI3's assignments = {SU3, SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU3, SU4}
30
35
Assignment III: Full Assignment with Maximum Assignments per Service Unit The reduction procedure should first attempt to keep full assignments (that is, all SIs being supported at their preferred number of active assignments) by loading the service units as much as possible. This first step in the procedure can succeed only if the following condition is fulfilled:
40
AIS Specification
137
Service AvailabilityTM Application Interface Specification System Description and System Model
(Maximum number of assignments that can be supported by all in-service service units) >=
5 (Number of assignments needed for all SIs given the preferred number of active assignments) AND 10 (Number of in-service service units) >= (Maximum of all preferred number of assignments for SIs). 15 This means that for the example configuration, full assignment is possible only if more than seven service units are instantiated. In the previous example, full assignment is not possible if one of the service units becomes unavailable. Assignment IV: Partial Assignment with Reduction of SIs Redundancy Level Assume that the state of the cluster is such that only SU1, SU2, and SU3 can be instantiated:
x x x x
20
in-service service units = {SU1, SU2, SU3} instantiable service units = {} assigned service units = {SU1, SU2, SU3} spare service units = {} 25
30
SI1's assignments = {SU1, SU2, SU3} SI2's assignments = {SU2, SU3, SU1} SI3's assignments = {SU3, SU1, SU2} SI4's assignments = {SU1, SU2, SU3} SI5's assignments = {SU1, SU2} SI6's assignments = {SU3} 35
Note that the number of assignments for SIs is reduced to cope with the shortage of in-service service units. The basic logic for assigning SIs to service units can be summarized as follows.
40
138
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The number of assignments that can be handled in this case is number of in-service service units (that is, 3) * maximum number of SIs per service unit (that is, 5). This means that in this example all available in-service service units can handle 15 SI assignments. This may force the Availability Management Framework to decide that the four most important SIs (that is, SI1, SI2, SI3, and SI4) will have three assignments, SI5 two assignments, and SI6 one assignment, as shown above. Assignment V: Partial Assignment with SIs Drop-Outs Assume that the state of the cluster is such that only SU1 can be instantiated:
x x x x
10
in-service service units = {SU1} instantiable service units = {} assigned service units = {SU1} spare service units = {} 15
x x x x x x
SI1's assignments = {SU1} SI2's assignments = {SU1} SI3's assignments = {SU1} SI4's assignments = {SU1} SI5's assignments = {SU1} SI6's assignments = {}
20
25
Note that it was impossible to keep assignments for all SIs in this example, so that the least important SI, SI6, was dropped.
3.7.5.5 Failure Handling
30
The failure recovery is required to avoid one or both of the following undesirable situations after the occurrence of a failure: (a) Some of the in-service service units have additional capacity to support more SIs, while some SIs are not being supported with their preferred number of active assignments. In this case, the Availability Management Framework should fill the slack capacity by assigning more service units active for these SIs.
35
40
AIS Specification
139
Service AvailabilityTM Application Interface Specification System Description and System Model
(b) Some less important SIs have more active assignments than those for some more important SIs. In this case, the Availability Management Framework should rearrange SI assignments, so that more important SIs get assigned, if possible. This, of course, may require removing some assignments of less important SIs.
The following subsection provides example for the cases (a) and (b):
3.7.5.5.1 Example for Failure Recovery
10
In this example, assume that the node hosting SU3 fails. Assignments Before the Node Hosting SU3 Fails
x x x x
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} spare service units = {}
15
20
SI1's assignments = {SU1, SU2, SU3, SU4, SU5, SU6} SI2's assignments = {SU2, SU3, SU4, SU5, SU6, SU7} SI3's assignments = {SU3, SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU3, SU4}
25
Assignments After Failure of the Node Hosting SU3, and Before the Recovery x in-service service units = {SU1, SU2, SU4, SU5, SU6, SU7, SU8}
x x x
30
instantiable service units = {} assigned service units = {SU1, SU2, SU4, SU5, SU6, SU7, SU8} spare service units = {}
35
SI1's assignments = {SU1, SU2, SU4, SU5, SU6} SI2's assignments = {SU2, SU4, SU5, SU6, SU7} SI3's assignments = {SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} 40
140
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
x x
SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU4}
Number of current assignments for SI1 = 5 Number of current assignments for SI2 = 5 Number of current assignments for SI3 = 5 Number of current assignments for SI4 = 6 Number of current assignments for SI5 = 6 Number of current assignments for SI6 = 5 10
15
Number of assignments on SU1 = 4 Number of assignments on SU2 = 4 Number of assignments on SU4 = 5 Number of assignments on SU5 = 5 Number of assignments on SU6 = 5 Number of assignments on SU7 = 5 Number of assignments on SU8 = 4
20
25
This result is not optimal for the following two reasons: (1) the less important SIs (that is, SI4 and SI5) have higher levels of assignment than more important SIs (that is, SI1, SI2, and SI3); (2) some in-service service units (that is, SU1, SU2, and SU8) have free capacity while some SIs are not assigned to as many service units as the preferred number of assigned service units. This situation requires failure recovery, which is discussed next. Assignments After Completion of Failure Recovery The failure recovery procedure is implementation-dependent, but the Availability Management Framework implementation should have the ultimate goal of maximizing the number of active assignments for the most important SIs (obviously, this number may not be higher than the preferred number of active assignments per SI); however, this may require complex reassignment algorithms; therefore, the specifica40 30
35
AIS Specification
141
Service AvailabilityTM Application Interface Specification System Description and System Model
tion does not enforce this goal to the implementation. At the end of this subsection, a more practical (but less ambitious) goal for failure recovery is given. Because the overall capacity of the service units is 35 (7 SIs with 5 assignments each), SI1 through SI5 should get full assignments and only SI6 should get partial assignments. According to this objective, the following can be the post-recovery assignments:
x x x x
in-service service units = {SU1, SU2, SU4, SU5, SU6, SU7, SU8} instantiable service units = {} assigned service units = {SU1, SU2, SU4, SU5, SU6, SU7, SU8} spare service units = {}
10
15
x x x x x x
SI1's assignments = {SU1, SU2, SU8, SU4, SU5, SU6} SI2's assignments = {SU2, SU1, SU4, SU5, SU6, SU7} SI3's assignments = {SU2, SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU4} 25 20
These assignments guarantee that the most important SIs get the highest number of assignments possible under the existing configuration limitations (hence, it is called an optimal assignment). As noted earlier, the failure recovery procedure is implementation-dependent. Thus, some simpler implementations may not arrive at the mentioned optimal solution. For example, a simple implementation that does not aim to guarantee "highest possible assignments to the most important SIs", but attempts to adjust the assignments partially (without service group level optimization), may end up with the following postrecovery configuration:
x x x
35
SI1's assignments = {SU1, SU2, SU7, SU4, SU5, SU6} SI2's assignments = {SU2, SU1, SU4, SU5, SU6, SU7} SI3's assignments = {SU2, SU4, SU5, SU6, SU7, SU8}
40
142
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
x x x
SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU8, SU1, SU2, SU4}
5 In this example, each of the SIs affected by the service unit failure is assigned to another service unit. For example, SI1 is assigned to SU7 as a replacement of its assignment to SU3. As mentioned at the beginning of this subsection, to make the Availability Management Frameworks implementation simpler, the specification does not require the optimal error recovery (as defined earlier in this section). It only requires that the error recovery procedure achieves the following non-optimal goals: (a) The more important SIs should get more assignments than less important SIs after the completion of the recovery. (b) The implementation should minimize the number of SI reassignments during the recovery process. (c) The free capacity of service units should be kept as small as possible.
3.7.5.6 Example of Auto-Adjust
10
15
20
As discussed earlier, the failure recovery should avoid undesirable situations (that is, underutilized service units and more important SIs not being assigned in higher number); however, the failure recovery may not consider the service units ordered list for assigning SIs. Thus, in some cases, the SIs are not arranged based on their service units ordered lists. The fail-over procedure can be initiated to do one of the following rearrangements: (1) redistribute the SIs to service units evenly and based on the per-SI ordering, so that the SIs are distributed among all assigned service units; (2) rearrange the assignment, so that the order of the per-SI service units is honored.
25
30
35
40
AIS Specification
143
Service AvailabilityTM Application Interface Specification System Description and System Model
The following example illustrates the auto-adjust procedure. Assignments Before the Node hosting SU3 Joins
x x x x
in-service service units = {SU1, SU2, SU4, SU5, SU6, SU7, SU8} instantiable service units = {} assigned service units = {SU1, SU2, SU4, SU5, SU6, SU7, SU8} spare service units = {}
10
SI1's assignments = {SU1, SU2, SU8, SU4, SU5, SU6} SI2's assignments = {SU2, SU1, SU4, SU5, SU6, SU7} SI3's assignments = {SU2, SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU4} 20 15
Now, assume that the node hosting SU3 joins the cluster. The following will be the service group configuration after the failure recovery. Assignments After the Node Hosting SU3 Joins Because only SI6 is not supported with full 6 active assignments, the Availability Management Framework can (at least) assign SU3 active for SI6. Therefore, the following can be the assignments after the node hosting SU3 joins the cluster:
x x x x
25
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} spare service units = {} 35 30
After the node hosting SU3 joins, the assignments look like:
x x x x x x
SI1's assignments = {SU1, SU2, SU8, SU4, SU5, SU6} SI2's assignments = {SU2, SU1, SU4, SU5, SU6, SU7} SI3's assignments = {SU2, SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU4, SU3} 40
144
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
If the administrator requests an auto-adjust, the assignments will look like after the completion of the auto-adjust: Assignments After Completion of the Auto-adjust Procedure
x x x x
in-service service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} instantiable service units = {} assigned service units = {SU1, SU2, SU3, SU4, SU5, SU6, SU7, SU8} spare service units = {}
SI1's assignments = {SU1, SU2, SU3, SU4, SU5, SU6} SI2's assignments = {SU2, SU3, SU4, SU5, SU6, SU7} SI3's assignments = {SU3, SU4, SU5, SU6, SU7, SU8} SI4's assignments = {SU4, SU5, SU6, SU7, SU8, SU1} SI5's assignments = {SU5, SU6, SU7, SU8, SU1, SU2} SI6's assignments = {SU7, SU8, SU1, SU2, SU3, SU4} 15
20
The N-way active redundancy model is represented by the UML diagram shown in the following figure.
FIGURE 20 UML Diagram of the N-Way Active Redundancy Model
25
30 1..* 1 N-way Active Redundancy Service Group 0..1 Service Unit 0..* active protects 0..M1 A service unit can take only active service instance assignments at a time 35
M1 can be configured 40
AIS Specification
145
Service AvailabilityTM Application Interface Specification System Description and System Model
In the no-redundancy model, the service group contains one or more service units. This redundancy model is typically used with non-critical components, when the failure of a component does not cause any severe impact on the overall system. This redundancy model has the following characteristics:
x
10
A service unit is assigned the active HA state for at most one SI. In other words, no service unit will have more than one SI assigned to it. A service unit is never assigned the standby HA state for an SI. The Availability Management Framework can recover from a fault only by restarting a service unit, or as an escalation, by restarting the node (see Section 9.4.7 on page 332) containing the service unit. No two service units exist having the same SI assigned to them. At any given time, several in-service service units can be instantiated for a service group: some have SIs assigned to them, and possibly some others are considered spare service units for the service group. The number of service units that have SIs assigned to them and the number of spare service units are dynamic and can change during the life-span of the service group; however, the preferred number of in-service service units can be configured. At any given time, the Availability Management Framework should ensure that each SI is assigned to a service unit if the number of in-service service units is large enough. SIs are ordered based on their importance. This ordered list will be used for assigning SIs to service units. As stated in Section 3.2.6, any service unit in a service group is capable of providing all SIs defined in the configuration for the service group.
15
x x
20
25
30
Note:
Components implementing the x_active_and_y_standby, x_active_or_y_standby, 1_active_or_y_standby, 1_active_or_1_standby, x_active, 1_active, or non-preinstantiable capability models can participate in the no-redundancy model.
35
40
146
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
3.7.6.2 Example
Node W
10
Service Unit S3
C3 C5 C2 C4 C6
15
20
active active active
25
3.7.6.3 Configuration
x
30
Ordered list of service units for a service group: this parameter is described in Section 3.7.1.1. Default value: no default, the order is implementation-dependent. Ordered list of SIs: for the general meaning of this parameter, refer to its definition in Section 3.7.1.1. The Availability Management Framework uses this ranking to select the SIs to drop from assignment if the number of service units is not enough for a full support of all SIs. Default value: no default, the order is implementation-dependent. Preferred number of in-service service units: the Availability Management Framework should make sure that this number of in-service service units is always instantiated, if possible. This preferred number is configured by setting
35
40
AIS Specification
147
Service AvailabilityTM Application Interface Specification System Description and System Model
the saAmfSGNumPrefInserviceSUs attribute of the saAmfSG object class (see Section 8.9). Default value: the number of the service units configured for the service group.
x
Auto-adjust option: for the general explanation of this option, refer to Section 3.7.1.1 on page 88. Section 3.7.6.6 on page 151 shows an example for handling the auto-adjust option in this redundancy model. Default value: no auto-adjust
Note that the preferred number of assigned service units is equal to the number of configured SIs plus one spare service unit.
3.7.6.4 SI Assignments
10
First, the general approach for assigning SIs to service units is discussed. Then, a few examples will be given for illustration. If the number of available service units in the cluster allows it, the Availability Management Framework will instantiate the preferred number of instantiated service units for the service group. Then, some or all of these service units will be used for SI assignments. The remaining instantiated service units will be spare. It is assumed that the service group configuration has passed a series of validations, so that when the required number of service units is assigned, each configured SI can be assigned to a service unit. The following example of a service group configuration will be used throughout this section.
x x x
15
20
25
Ordered list of service units = {SU1, SU2, SU3, SU4, SU5} Ordered list of SIs = {SI1, SI2, SI3} Preferred number of in-service service units = 4 30
Assignment I: Full Assignment with Spare Assume that under the current state of the cluster, all service units can be brought inservice. Then, the following can be a running configuration for the service group.
x x x x
35
in-service service units = {SU1, SU2, SU3, SU4} instantiable service units = {SU5} assigned service units = {SU1, SU2, SU3} spare service units = {SU4}
40
148
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The following points should be mentioned regarding these assignments: (1) The selection of in-service service units is based on the ordered list of service units. (2) The assignments of SIs to service units are based on the ranking of the SIs in the ordered list of SIs.
3.7.6.4.1 Reduction Procedure
10
The first goal of the assignment procedure is to try to keep all SIs assigned. If this goal is not attainable, then the next goal is to keep as many important SIs as possible assigned. Assignment II: Full Assignment with Spare Reduction Assume that under the current state of the cluster, SU4 and SU5 cannot be instantiated. Then, the following can be a running configuration for the service group.
x x x x
15
20
in-service service units = {SU1, SU2, SU3} instantiable service units = {} assigned service units = {SU1, SU2, SU3} spare service units = {}
25
30
Assignment III: Partial Assignment If the number of instantiated service units is not large enough, some less important SIs will be dropped. Assume that only SU1 and SU2 can be brought in-service in this example.
x x x
35
in-service service units = {SU1, SU2} instantiable service units = {} assigned service units = {SU1, SU2}
40
AIS Specification
149
Service AvailabilityTM Application Interface Specification System Description and System Model
The failure handling is rather simple. If a node hosting a service unit fails, the only failover option is to select a spare service unit from the service group's spare service units and assign the SI of the failed service unit to the selected spare service unit. If no spare service unit is available, the Availability Management Framework cannot carry out any failure handling, and the SI that was being provided by the failed service unit will not be supported until another service unit becomes available for the service group. The following example illustrates the fail-over action. Assignments Before the Node Hosting SU3 Failed
x x x x
10
15
20
in-service service units = {SU1, SU2, SU3, SU4} instantiable service units = {SU5} assigned service units = {SU1, SU2, SU3} spare service units = {SU4}
25
x x x
30
Assignments After the Failure Recovery x in-service service units = {SU1, SU2, SU4, SU5}
x x x
35
instantiable service units = {} assigned service units = {SU1, SU2, SU4} spare service units = {SU5} 40
150
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
x x x
The auto-adjust procedure does not achieve much in this redundancy model. It only makes sure that the SIs are assigned to the most preferred in-service service units. The following example illustrates the auto-adjust procedure. 10 Assignments Before the Auto-adjust Procedure After the node hosting SU3 joins the cluster (see previous example), the service units and the assignments can be as follows:
x x x x
in-service service units = {SU1, SU2, SU4, SU5} instantiable service units = {SU3} assigned service units = {SU1, SU2, SU4} spare service units = {SU5}
15
20
x x x
25
Because the ranking of SU3 for SI3 is higher than the ranking of SU4 for SI3, if the auto-adjust option is enabled for the service group when SU3 is brought in-service again, the assignments will look like: 30 Assignments After the Auto-adjust Procedure
x x x x
in-service service units = {SU1, SU2, SU3, SU4} instantiable service units = {SU5} assigned service units = {SU1, SU2, SU3} spare service units = {SU4}
35
x x x
40
AIS Specification
151
Service AvailabilityTM Application Interface Specification System Description and System Model
Note that SU5 has been uninstantiated, because the number of preferred service units is 4.
3.7.6.7 UML Diagram of the No-Redundancy Redundancy Model
The no-redundancy redundancy model is represented by the UML diagram shown in the following figure.
FIGURE 22 UML Diagram of the No-Redundancy Redundancy Model
10 Service Unit 0..1 active protects 0..1 0..1 20 A service unit can take only one active service instance assignment at a time
15
3.7.7 The Effect of Administrative Operations on Service Instance Assignments Usually, administrative operations such as lock or unlock of a service unit or a node result in reassignments of SIs to service units. This section briefly discusses the effects for the lock and unlock operations. The cases for other administrative operations are similar. Only basic directions are given here, because the detailed reaction of the Availability Management Framework for each administrative operation depends on the redundancy model. The details are left for the implementation.
3.7.7.1 Locking a Service Unit or a Node
25
30
35
Since the lock for instantiation does not affect the service instance assignment, this subsection focusses on the lock operation only. 40 Depending on the status of the service unit, one of the following cases can happen when locking a service unit:
152
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
(a) The service unit (say SU1) or one of its enclosing entities like the node, service group, application, or the cluster is being locked and the service unit has SI assignments: in this case, the SIs supported by the service units will be reassigned to other service units in the service group. This reassignment depends obviously on the redundancy model of the service group. The transfer of SI assignments from the service unit SU1 to other service units is very similar to the recovery operation performed when a service units fails. For details, refer to the failure handling section of the associated redundancy model. However, it is important to note that an effective reassignment may require selecting one of the spare service units or instantiating a new service unit from the instantiable set. The removal of SI assignments will not trigger a termination of the service unit, and the operation discussed in case (b) is undertaken when the service unit enters the out-of-service readiness state. (b) The service unit (say SU1) or one of its enclosing entities like the node, service group, application, or the cluster is being locked, and the service unit has no current SI assignments, but it belongs to the set of in-service service units: when the service unit SU1 becomes out-of-service, and as a consequence the number of in-service service units drops below the preferred number of in-service service units, one instantiable service unit with none of its containing entities (service group, node, application, or cluster) in locked state will be selected to replace the service unit SU1. This selection will be based on the service units and their ranks, as discussed in Section 3.7.1. The service unit SU1 stays in the set of instantiated service unit. (c) The service unit to be locked does not belong to the set of in-service service units: no SI reassignment or service unit instantiation is performed.
3.7.7.2 Unlocking a Service Unit, a Service Group, or a Node
10
15
20
25
After unlocking a service unit, the following cases can occur: (a) The service unit does not belong to the set of instantiable service units. Nothing can be done in this case, and the service unit still remains out of the set of instantiated service units. (b) The service unit belongs to the set of instantiable service units, but it is not instantiated. If the preferred number of in-service service units is not reached, the service unit is instantiated. If the service unit can be brought in-service, the operation described in case (c) is undertaken. (c) The service unit is in-service. Based on the configuration of the service group (auto-adjust option and preferred number of assignments) and the current assignments, some SIs may be assigned to the service unit. 30
35
40
AIS Specification
153
Service AvailabilityTM Application Interface Specification System Description and System Model
2N
N+M
N-Way
10
X X X X X X X
X -
15
20
A component with capability models x_active or 1_active for a certain component service type is eligible for being used in service groups with redundancy models 2N and N+M. The component may have the active, quiescing, or quiesced HA states, but not the standby HA state for its CSIs. Nevertheless, its service unit can be assigned the standby HA state for a service instance. The Availability Management Framework does not attempt to assign the standby HA state for a CSI to the component in this case.
25
30
35
40
154
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The SI-SI dependencies are also applicable to applications (see also Section 3.2.7). The dependencies apply in two cases.
x
When a service unit is assigned the active HA state on behalf of a service instance or a component is assigned the active HA state on behalf of a component service instance. When the active HA state was assigned to a service unit or a component on behalf of a service instance or component service instance respectively, and another HA state is now assigned, or the active HA state assignment is removed.
15
20
3.9.1.1 Dependencies Among SIs when Assigning a Service Unit Active for a Service Instance
A service instance SI1 may be configured to depend on other service instances (especially within the scope of an application logical entity, as defined in Section 3.2.7), SI2, SI3, and so on, in the sense that a service unit can only be assigned the active HA state for SI1 if all SI2, SI3, and so on are either fully-assigned or partially-assigned (see Section 3.3.3.2). A dependency between a service instance SI1 on a service instance SI2 is configured by specifying the DN of SI2 in the safDepend attribute in an object of the SaAmfSIDependency association class that is associated with the object representing SI1 (see Section 8.11). These dependencies are cluster-wide, which means that the service instances on which a service instance SI1 depends can belong to the same service group as SI1 or to another service group.
25
30
35
40
AIS Specification
155
Service AvailabilityTM Application Interface Specification System Description and System Model
The Availability Management Framework defines one configurable attribute of a dependency between service instances, the tolerance time: if a service instance SI1 depends on the service instance SI2, this time indicates for how long SI1 can tolerate SI2 being in the unassigned state (see Section 3.3.3.2). If this time elapses before SI2 becomes assigned again, the Availability Management Framework will remove the active and the quiescing HA states for SI1 from all service units, that is, it will make SI1 unassigned. The tolerance time is configured by setting the saAmfToleranceTime attribute of the SaAmfSIDependency association class (see Section 8.11). This tolerance time can be set to zero to indicate to the Availability Management Framework that it must remove the active and the quiescing HA states for SI1 from all service units immediately as soon as SI2 is unassigned.
3.9.1.3 Dependencies Among Component Service Instances of the same Service Instance
10
15
A component service instance of a service instance can be configured to depend on other component service instances of the same service instance. This dependency amongst component service instances is configured by specifying a list of the DNs of the component service instances on which a component service instance depends in the saAmfCSIDependencies attribute of the saAmfCSI object class (see Section 8.12). The Availability Management Framework performs the assignment of the active HA state to components on behalf of component service instances in a sequence determined by these dependencies: if a component service instance CSI1 depends on the component service instance CSI2, a component can only be assigned the active HA state for CSI1 if any of the components of the service unit in question has already acknowledged the assignment of the active HA state for CSI2 by calling saAmfResponse(). The reverse order is applied when, on behalf of component service instances, the active HA state is removed from components or another HA state is assigned to components. Note that dependencies among component service instances also apply when restarting components within a service unit. Since component service instances assigned to a component must be removed when restarting a component, dependent component service instances within the same service instance will also be removed. Example: suppose a component C1 consisting of an HTTP server supporting a component service instance CSI1 that contains an IP address and a port number. The server binds to that IP address (and not to INADDR_ANY) and to that port number. A second component C2 implements a virtual IP address service, and its component service instance CSI2 contains simply the same IP address as above. CSI2 must be assigned before CSI1; otherwise the bind() system call would fail.
20
25
30
35
40
156
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
3.9.2 Dependencies Among Components A component can be configured to depend on another component in the same service unit in the sense that the instantiation of the second component is a prerequisite for the instantiation of the first component. Dependencies amongst components described in this section are applicable only when instantiating or terminating a service unit. These dependencies in no way influence the state transitions effected by the Availability Management Framework. Such explicit dependencies can be configured between any two pre-instantiable components in the same service unit. Note that implicit dependencies exist between a proxy and its proxied componentsnot to be discussed here. A system administrator can take advantage of dependencies amongst components to avoid launching processes that perform a lengthy initialization concurrently, as this could lead to CPU saturation. A "tempered" launching of these processes could be more adequate. Dependencies amongst components are configured by associating an instantiation level with each pre-instantiable component. The instantiation level is a positive integer configured for such components. The corresponding configuration attribute is the saAmfCompInstantiationLevel attribute of the saAmfComp object class (see Section 8.13.2). Within a service unit, the Availability Management Framework instantiates the preinstantiable components according to the configured instantiation level. All preinstantiable components with the same instantiation level are instantiated by the Availability Management Framework in parallel. Components of a given level are only instantiated by the Availability Management Framework when all components with a lower instantiation level have successfully completed their instantiation. Within a service unit, the Availability Management Framework terminates the preinstantiable components according to the configured instantiation level. All preinstantiable components with the same instantiation level are terminated by the Availability Management Framework in parallel. Pre-instantiable components of a given level are only terminated by the Availability Management Framework when all preinstantiable components with a higher instantiation level have been terminated. As has been said previously, the instantiation level is only applicable during service unit instantiation and termination. As restarting a service unit means terminating the service unit and instantiating it again, the instantiation level also applies when restarting a service unit. If single components within a service unit are restarted, the instan-
10
15
20
25
30
35
40
AIS Specification
157
Service AvailabilityTM Application Interface Specification System Description and System Model
tiation level does not cause components with a higher level to be also subject to a restart. The instantiation level is, above all, a means to limit the load on the system during the instantiation process. Non-pre-instantiable components are only instantiated when they have to provide service (for instance, when the Availability Management Framework assigns them the active HA state for a component service instance). If dependencies amongst a non-pre-instantiable and another component exist, they should be resolved by using the inter-CSI (CS ICSI) dependency scheme.
10
15
20
25
30
35
40
158
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
By the use of a wrapper to encapsulate the legacy software (hardware) into an SA-aware component. The wrapper consists of one or more processes that link with the Availability Management Framework library and interact with the Availability Management Framework on the one hand and with the legacy software (hardware) on the other hand. The wrapper and the legacy software (hardware) together constitute a single component. By the use of a proxy to manage the legacy software (hardware). The legacy software (hardware) can be considered to be a separate component managed by the proxy component.
10
In general, the proxy/proxied solution is appropriate most when one of the following is true:
x
15
The redundancy model of the proxied entity (the legacy software or hardware) is different from the redundancy model of the proxy entity. The proxy entity usually requires a very simple redundancy model such as 2N, whereas the legacy entity may need a more complex redundancy models such as N+M and N-way active. The failure semantics and fault zone of the proxied entities are different from the ones for proxy entities. For example, the proxied entity may be running outside the cluster, whereas the proxy entity has to be located in a node.
20
25
30
35
40
AIS Specification
159
Service AvailabilityTM Application Interface Specification System Description and System Model
Passive Monitoring: the component is not involved in the monitoring, and mostly operating system features are used to assess the health of a component. These features include monitoring the death of processes that are part of the component (but it could also be extended to also monitor crossing some thresholds in resource usage such as memory usage). External Active Monitoring: the component does not include any special code to monitor its health, but some entity external to the component (usually called a monitor) assesses the health of the component by submitting some service requests to the component and checking that the service is provided in a timely fashion. Internal Active Monitoring: the component includes code (often called audits) to monitor its own health and to discover latent faults. Each of these health checks is triggered either by the component itself or by the Availability Management Framework.
10
15
These three types of monitoring are in fact complementary. Passive monitoring or external active monitoring do not need modification of the component itself and can be applied to non-SA-aware components. The Availability Management Framework supports these three types of monitoring. The passive monitoring of components is covered by the API functions saAmfPmStart_3() (refer to Section 7.7.1 on page 242) and saAmfPmStop() (refer to Section 7.7.2 on page 244). External active monitoring is supported with two command line interfaces (CLI), namely the commands AM_START (refer to Section 4.9 on page 183), which is used to start a monitoring process for a component and AM_STOP (refer to Section 4.10 on page 183), which is used to stop a monitoring process for a component. Due to the extra load put on the system to run CLI commands (need to spawn a process each time), it is preferable to have long running processes for external active monitors (as opposed to run periodically a monitoring command similarly to what is done for audits). The internal active monitoring of components is accomplished through the healthcheck interfaces (refer to Section 7.1.2 on page 200).
20
25
30
35
40
160
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
Error detection is the responsibility of all entities in the system. Errors are reported to the Availability Management Framework by invoking the saAmfComponentErrorReport() API function, described in Section 7.12.1 on page 281. The invoker of this function specifies the recommended recovery action, which can assume the values defined in the SaAmfRecommendedRecoveryT enum, described in Section 7.4.7 on page 222. Components play an important part in error detection and should report their own errors or the errors of other components with which they interact. The Availability Management Framework itself also generates error reports on components when it detects errors while interacting with components. For the different cases, refer to Section 3.3.2.2 on page 59. It is assumed that a reported error does not refer explicitly to a specific component service instance currently assigned to the component. It rather applies to the component as a whole.
3.12.1.2 Restart
10
15
20
The restart of a component means any of the following sequences of life cycle operations:
x x x
25
The latter sequence applies if an error occurs during the terminate operation. Appendix A describes how these operations are implemented for the various types of components. 35 The Availability Management Framework terminates erroneous components abruptly by executing the appropriate cleanup operation for the component (see Table 34 in Appendix A). Non-erroneous components are terminated gracefully by first attempting to run the corresponding callback or the TERMINATE command (see also Table 34 in Appendix A).
40
AIS Specification
161
Service AvailabilityTM Application Interface Specification System Description and System Model
During a restart because of a failure, a component remains enabled, and its readiness state may or may not change according to changes in its presence state (as described in Section 3.3.2.1), which in turn determines whether its component service instances must be removed (refer to Section 3.3.2.3). Restarting a service unit is achieved by the following actions:
x
5 First, all components in the service unit are terminated in the order dictated by their instantiation-levels. In a second step, all components in the service unit are instantiated in the order dictated by their instantiation-levels.
10
During this restart procedure, the components follow their relevant state transition (see Section 3.3.2.1), which affects the presence state of the service unit (see Section 3.3.1.1) and, consequently, its readiness state (see Section 3.3.1.4), which in turn determines the service instance assignments. If a service unit contains only restartable components, that is, the saAmfCompDisableRestart configuration attribute of all the components is set to SA_FALSE (see the SaAmfComp object class in Section 8.13.2), the service unit remains in the in-service readiness state during the restart. As a consequence, its service instance assignments remain intact.
3.12.1.3 Recovery
15
20
Recovery is an automatic action taken by the Availability Management Framework (no human intervention) after an error occurred to a component to ensure that all component service instances that were assigned to this component, are reassigned to non-erroneous components. This applies to all component service instances regardless of the HA state of the component for these component service instances. The recovery actions are described in the following subsections. In this section and also throughout this document, the values defined in the SaAmfRecommendedRecoveryT enum, described in Section 7.4.7 on page 222, will be used to designate the corresponding recovery actions, without necessarily referring to this enum. Note that if a component fails, just removing component service instances from the component and reassigning the component service instances to it (without restarting the component) is not considered as a valid recovery action. One of the recovery actions described in the following subsections is configured per component as the default recovery action. The corresponding configuration attribute
25
30
35
40
162
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
is saAmfCompRecoveryOnError, defined in the SaAmfComp object class (see Section 8.13.2). The Availability Management Framework engages the default recovery action under the following circumstances:
x x
The error report specifies the value SA_AMF_NO_RECOMMENDATION. A component does not respond to a callback invoked by the Availability Management Framework within a reasonable period of time. A component responds with an error to a callback invoked by the Availability Management Framework on the component.
10
The objective here is to avoid reassigning service instances to different service units. The Availability Management Framework tries to fix the problem by restarting some components and reassigning them all component service instances previously assigned with the same HA state. This may not always be possible, as other events that would prevent the Availability Management Framework from performing such assignments (for example some dependencies may not be satisfied anymore) may have happened during the recovery. The following levels of restart are provided:
x
15
20
Restart the erroneous component: the erroneous component is abruptly terminated and then instantiated again. The Availability Management Framework attempts to reassign component service instances previously assigned to the components with the same HA state. This action is performed as a consequence of an SA_AMF_COMPONENT_RESTART recommended recovery action provided in the error report. Restart all components of the service unit: all components of the service unit that contains the erroneous component are abruptly terminated and then instantiated again (see Section 3.12.1.2). This action is performed as a consequence of an escalation of an SA_AMF_COMPONENT_RESTART recommended recovery action. Restart the associated container and all collocated contained components: The Availability Management Framework performs the following actions in sequence:
x
25
30
35
it abruptly terminates the affected contained component and its collocated contained components; it terminates the associated container component; it instantiates the container component and attempts to reassign component service instances previously assigned to this container component (including the corresponding container CSIs) with the same HA state; 40
x x
AIS Specification
163
Service AvailabilityTM Application Interface Specification System Description and System Model
it requests the container component to instantiate the associated contained components and attempts to reassign component service instances previously assigned to these contained components with the same HA state.
This action is performed as a consequence of an SA_AMF_CONTAINER_RESTART recommended recovery action (requested for a contained component) or as a consequence of an SA_AMF_COMPONENT_RESTART recommended recovery action requested for a container component. The Availability Management Framework must provide the option to disable restart recovery actions for particular components. This option should be used when restarting a component takes too much time, and fail-over is a preferred recovery action. See Section 3.3.2.1 on page 55.
10
15
3.12.1.3.2 Fail-Over Recovery Action
Either because the restart recovery action has been disabled in the configuration of a particular component (its saAmfCompDisableRestart configuration attribute is set to SA_FALSE, see the SaAmfComp object class in Section 8.13.2), or because previous attempts to restart the component failed, or because the error report specified another recommended recovery action, the Availability Management Framework may decide to recover by reassigning service instances to service units other than the one to which they are currently assigned. The different levels of fail-over listed next differ by the scope of the service instances being failed over (some service instances assigned to a service unit, or all service instances assigned to services units of a node) and how abruptly component service instances are removed from the components to which they are currently assigned (regular HA state management leading to the removal of the component service instance, or graceful component termination, or abrupt component termination or abrupt node reboot).
x
20
25
30
Component or Service Unit Fail-Over The Availability Management Framework provides the saAmfSUFailover configuration attribute at the service unit level (see the SaAmfSU object class in Section 8.10) to indicate whether a component fail-over should trigger a fail-over of the entire service unit or only of the erroneous component. By default, a service unit fail-over is performed. If the service unit is configured to fail over as a single entity (saAmfSUFailover set to SA_TRUE), all other components of the service unit are abruptly terminated, and all service instances assigned to that service unit are failed over; otherwise, only the erroneous component is abruptly terminated, and all component service instances that were assigned to it are failed over. Other components are not terminated, but all service instances that contained one of the failed over
35
40
164
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
component service instances have their remaining component service instances switched over. Switch-over means that component service instances are not abruptly removed from components; the HA state of these components for these component service instances is rather transitioned to the quiesced HA state before being removed. The following example helps in clarifying this. Assume a service group having some service units, each comprising 3 components. One of these service units, SU1, contains the C1, C2, and C3 components. Now, assume that SU1 is assigned the active HA state for two service instances, SI1 and SI2. SI1 contains 3 CSIs: CSI11, CSI12, and CSI13, which are assigned to C1, C2, and C3 respectively, and SI2 contains only 2 component service instances, CSI21 and CSI23, which are assigned to C1 and C3 respectively. Assume that C2 fails. C2 is abruptly terminated. As C2 was assigned CSI12, CSI12 is failed over and the other component service instances of SI1 need to be switched over, namely CSI11 and CSI13. However, it is not necessary to switch over SI2, as it has no CSIs assigned to the failed component C2. In a 2N or N+M redundancy model, SI2 also needs to be switched over; otherwise, the number of active service units would be higher than what is allowed by the redundancy model. However, in an N-way redundancy model, SI2 could be left assigned to SU1, and a repair of C2 should be attempted by reinstantiating it. If the attempt to instantiate C2 fails, the service unit becomes disabled, and SI2 must be switched-over; however, if the attempt to instantiate C2 is successful, SI2 shall remain assigned to SU1, and based on other configuration parameters and N-Way redundancy model semantics, even SI1 might get reassigned to SU1. This action is performed as a consequence of an SA_AMF_COMPONENT_FAILOVER recommended recovery action or of an escalation to it. Node Switch-Over This implies an abrupt termination of the failed component and the fail-over of all component service instances that were assigned to it. All service instances assigned to service units on the node have their remaining component service instances switched over. Switch-over means that component service instances are not abruptly removed from components; the HA state of these components for these component service instances is rather transitioned to the quiesced HA state before being removed. This action is performed as a consequence of an SA_AMF_NODE_SWITCHOVER recommended recovery action.
10
15
20
25
30
35
40
AIS Specification
165
Service AvailabilityTM Application Interface Specification System Description and System Model
Node Fail-Over This implies an abrupt termination of all local components and failing over all service instances assigned to all service units on a node. This action is performed as a consequence of an SA_AMF_NODE_FAILOVER recommended recovery action, or as the result of a recovery escalation. Node Failfast The Availability Management Framework reboots the node by using a low level interface without trying to terminate the components individually. The reboot operation must be carried out in such a way that all local components of the node (including its hardware components) are placed into the uninstantiated presence state, which may require powering-down or resetting some hardware entities (potentially using the HPI). As part of the node failfast operation, a failover of the service instances assigned to service units on the node is performed. This action is performed as a consequence of an SA_AMF_NODE_FAILFAST recommended recovery action.
10
15
The application should be completely terminated and then started again by first terminating all of its service units and then starting them again, ensuring that during the termination phase of the restart procedure it is not required to reassign service instances (refer additionally to Section 9.4.7 on page 332). It is important to note that it is not required to preserve the pre-restart service instance assignments to various service units in the application upon re-starting an application. The instantiation phase of this recovery action should be carried out in accordance with the redundancy model configuration of the various service groups that belong to the application. This action is performed as a consequence of an SA_AMF_APPLICATION_RESTART recommended recovery action, which should be specified when the failure is deemed to be a global application failure.
3.12.1.3.4 Cluster Reset Recovery Action
20
25
30
The cluster should be reset. In order to execute this function, the Availability Management Framework reboots all nodes that are part of the cluster by using a low level interface without trying to terminate the components individually. To be effective, this operation must be performed so that all nodes are first halted before any of the nodes boots again. This recommendation should be used only in the rare case in which a component (most likely itself involved in error management) has enough knowledge to foresee a "cluster reset" as the only viable recovery action from a global failure. This action is performed as a consequence of an SA_AMF_CLUSTER_RESET recommended recovery action.
35
40
166
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
3.12.1.4 Repair
Repair is the action performed on erroneous entities (that is, entities with a disabled operational state) to bring them back into a healthy state (that is, to the enabled operational state). One Availability Management Framework configuration attribute at node level (saAmfNodeAutoRepair of the SaAmfNode object class, see Section 8.7) and another one at service group level (saAmfSGAutoRepair of the SaAmfSG object class, see Section 8.9) specify whether the Availability Management Framework engages in automatic repair or not. The saAmfSGAutoRepair attribute applies to any service unit of the particular service group, and the saAmfNodeAutoRepair attribute applies only to the particular node. If saAmfSGAutoRepair or saAmfNodeAutoRepair is turned on, the Availability Management Framework performs an automatic repair action after undertaking some recovery actions at the service unit or node level respectively. If an automatic repair configuration attribute is turned off (saAmfNodeAutoRepair or saAmfSGAutoRepair set to SA_FALSE), the Availability Management Framework performs no automatic repair action at the corresponding level, and it is the responsibility of system management applications or system administrators to perform repair actions (which are not under the Availability Management Frameworks control), and then reenable the appropriate operational states when the repair is successfully completed by executing the SA_AMF_ADMIN_REPAIRED administrative operation. It is expected that these repair actions bring the repaired service units in either the instantiated or uninstantiated presence state before reenabling the appropriate operational states. Note that combined recovery and repair actions like the node failfast are also disabled when saAmfSGAutoRepair is set to SA_FALSE. The Availability Management Framework treats the component and service unit restart recovery actions, which are described in Section 3.12.1.3.1, as repair actions and does not require any additional repair action in this case. The Availability Management Framework reenables the operational state of the component or the service unit when the restart operation completes successfully. In the case of a component fail-over recovery action and regardless of any configuration attribute setting, the Availability Management Framework always tries to reinstantiate the erroneous component; If it is successful, it reenables the erroneous component. The Availability Management Framework performs these actions to avoid leaving a service unit partially disabled for an indefinite amount of time. If the
10
15
20
25
30
35
40
AIS Specification
167
Service AvailabilityTM Application Interface Specification System Description and System Model
instantiation of the erroneous component fails, the Availability Management Framework sets the operational state of the service unit to disabled. If a node leaves the cluster membership while the Availability Management Framework is performing an automatic repair action on a service unit of that node, the fact that the node leaves the cluster membership supersedes the service unit repair action, and the Availability Management Framework considers the repair action completed when the node rejoins the cluster membership. However, if a node leaves the cluster membership while the Availability Management Framework is performing an automatic repair action on that node, the fact that the node leaves the cluster membership may not eliminate the need for the node repair action, and the Availability Management Framework may need to complete the repair action when the node rejoins the cluster membership, if the node has not been rebooted in the meantime.
3.12.1.4.1 Recovery and Associated Repair Policies
10
15 In this section, the recovery policies and the associated automatic repair policies are presented.
x
Service Unit Fail-Over RecoveryIn the context of a service unit fail-over recovery action, the Availability Management Framework attempts to terminate all components of the service unit. If the service group containing the service unit has the automatic repair configuration attribute set (saAmfSGAutoRepair set to SA_TRUE), and all components have been successfully terminated, the Availability Management Framework reenables the operational states of the service unit and its disabled components and evaluates the various criteria used to determine if the service unit must be reinstantiated (such as the preferred number of in-service service units for the service group containing that service unit), and then reinstantiates service units, if deemed necessary. Node Switch-Over, Node Fail-Over and Node Failfast RecoveryAfter a node switch-over or node fail-over recovery action, if the erroneous node has the automatic repair configuration attribute set (saAmfNodeAutoRepair set to SA_TRUE), the Availability Management Framework reboots the node. The Availability Management Framework treats a node failfast recovery action as a repair action and does not require any additional repair action in this case. When such a node rejoins the cluster, the Availability Management Framework reenables its operational state and the operational state of its disabled service units and components (except for components with the termination-failed presence state). It then evaluates the various criteria used to determine if service units of that node must be reinstantiated (such as the preferred number of in-service service units service groups that have service units on that node) and reinstantiates service units if deemed necessary.
20
25
30
35
40
168
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
The following table describes the recovery policies and the associated automatic repair policies. Table 16 Recovery and Associated Automatic Repair Policies Recovery Action service unit fail-over node switch-over node fail-over node failfast
3.12.1.4.2 Restrictions to Auto-Repair
Automatic Repair Availability Management Framework attempts to instantiate the service unit node reboot node reboot nonealready part of recovery
10
15
It is imperative that under certain circumstances the Availability Management Framework must not engage auto-repair actions. One such instance is during a software upgrade campaign, as defined by the Software Management Framework specification ([6]). In this case, the Availability Management Framework must be explicitly prevented from undertaking an automatic repair action to enable the initiator of the upgrade campaign to take some corrective or alternate actions like suspending the campaign in case components fail when they are being upgraded. In order to disable the auto-repair behavior of the Availability Management Framework on a selective basis for components and containing service units, the service unit configuration in the Availability Management Framework Information Model supports a configuration attribute called saAmfSUMaintenanceCampaign (see the SaAmfSU object class in Section 8.10 on page 302), which can be modified to instruct the Availability Management Framework about disengaging auto-repair under various circumstances. Note that the Availability Management Framework treats the component and service unit restart actions, which are described in Section 3.12.1.3.1, as well as node failfast recovery actions as repair actions which are also disabled by setting the saAmfSUMaintenanceCampaign configuration attribute. The configuration attribute contains the name of the maintenance campaign that is being currently run. When this attribute holds a valid value for a particular service unit, the Availability Management Framework disables the service unit without attempting any sort of repair in case constituent components fail. Additionally, all operational state change notifications (see Section 11.2.2.2 on page 378) pertinent to that service unit contain an indication that the service unit is involved in a maintenance (or upgrade) campaign.
20
25
30
35
40
AIS Specification
169
Service AvailabilityTM Application Interface Specification System Description and System Model
When an error is reported on a component, the error report also contains a recommended recovery action. The Availability Management Framework decides whether the recommended recovery action is executed, rejected, or escalated. The recovery escalation covers cases in which the recovery action is too weak to prevent further errors. The underlying principle of the escalation is to progressively extend the scope of the error from component to service unit, and from service unit to node (that is, considering more and more entities to be involved in the error that shows up in a component). 3.12.2 Recovery Escalation Policy of the Availability Management Framework
3.12.2.1 Recommended Recovery Action
10
The following recommended recovery actions are defined in the SaAmfRecommendedRecoveryT enum (see Section 7.4.7 on page 222) and can be specified in the saAmfComponentErrorReport() API (refer to Section 7.12.1 on page 281):
x
15
SA_AMF_NO_RECOMMENDATION: used when the scope of the error is unknown. In this case, the Availability Management Framework engages the configured recovery policy for the component, which is specified by the saAmfCompRecoveryOnError configuration attribute, defined in the SaAmfComp object class (see Section 8.13.2). SA_AMF_COMPONENT_RESTART: used when the scope of the error is the component. SA_AMF_CONTAINER_RESTART: used when the scope of the error is a container component and all collocated contained components. This recommended recovery action can only be requested for a contained component in order to restart the associated container. SA_AMF_COMPONENT_FAILOVER: used when the error is related to the execution environment of the component on the current node. SA_AMF_NODE_SWITCHOVER, SA_AMF_NODE_FAILOVER, and SA_AMF_NODE_FAILFAST: These three recommended recovery actions are used when the error has been identified as being at the node level and components should not be in service on the node. They indicate different levels or urgency to move the service instances out of the node. SA_AMF_APPLICATION_RESTART: used when the error has been identified as a global application failure.
20
25
30
35
40
170
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
SA_AMF_CLUSTER_RESET: used when the error has been identified at the cluster level.
The Availability Management Framework validates the recommended recovery action in an implementation-dependent way. This could be done, for example, by putting in place security measures like access control and authentication schemes. If the validation succeeds, the Availability Management Framework will not implement a weaker recovery action than the recommended one; however, the Availability Management Framework may decide to implement a stronger recovery action based on its recovery escalation policy. If the validation fails, the Availability Management Framework rejects the error report with the return code SA_AIS_ERR_ACCESS unless the recommended recovery action is SA_AMF_NO_RECOMMENDATION. The following levels of escalation are implemented by the Availability Management Framework: Table 17 Levels of Escalation Escalation Level 1 2 3 Recommendation SA_AMF_COMPONENT_RESTART SA_AMF_COMPONENT_RESTART SA_AMF_COMPONENT_RESTART or SA_AMF_COMPONENT_FAILOVER Escalated to
10
15
If some components of the same service unit fail and are restarted too many times within a given time period (the probation period), the Availability Management Framework escalates to a restart of the entire service unit. If, after this first level of escalation, the service unit is restarted too many times in a given time period because of failures of its components, the Availability Management Framework performs the SA_AMF_COMPONENT_FAILOVER recovery action, which is described as Component or Service Unit Fail-Over on page 164. The remainder of this section provides a detailed explanation on how AMF implements the first two escalation levels.
30
35
40
AIS Specification
171
Service AvailabilityTM Application Interface Specification System Description and System Model
Each service group can be configured with the following attributes, which are defined in the SaAmfSG object class (see Section 8.9):
x x x x
saAmfSGCompRestartProb (time value) saAmfSGCompRestartMax (maximum count) saAmfSGSuRestartProb (time value) saAmfSGSuRestartMax (maximum count) 5
The escalation policy algorithm for escalations of levels 1 and 2 starts when an error with an SA_AMF_COMPONENT_RESTART recommended recovery action is received by the Availability Management Framework for a component of a particular service unit and the service unit is not already in the middle of a probation period (neither "component restart" nor "service unit restart" probation period, see below). At this time, the Availability Management Framework considers that it is at the beginning of a new "component restart" probation period for that service unit. The Availability Management Framework starts counting the number of components of that service unit it has to restart due to an error report with an SA_AMF_COMPONENT_RESTART recommended recovery action. Components restarted due to dependencies (see Section 3.9.2) should not be counted. If this count does not reach the saAmfSGCompRestartMax value before the end of the "component restart" probation period (the duration of the period is specified by saAmfSGCompRestartProb), the "component restart" probation period for the affected service unit expires. It will be reinitiated when the Availability Management Framework receives the next occurrence of an error with an SA_AMF_COMPONENT_RESTART recommended recovery action for a component of the particular service unit. If this count reaches the saAmfSGCompRestartMax value before the end of the "component restart" probation period, the Availability Management Framework performs the first level of recovery escalation for that service unit: the Availability Management Framework restarts the entire service unit. At this time, the Availability Management Framework considers that escalation of level 1 is active for this service unit and terminates the current "component restart" probation period for the service unit. At the same time, it starts the "service unit restart" probation period for the service unit. During the "service unit restart" probation period, each error report on the service unit with an SA_AMF_COMPONENT_RESTART recommended recovery action immediately escalates to an entire service unit restart (as level 1 escalation is active). When the "service unit restart" probation period starts, the Availability Management Framework
10
15
20
25
30
35
40
172
AIS Specification
Service AvailabilityTM Application Interface Specification System Description and System Model
also starts counting the number of times it has to perform a level 1 escalation. If this count does not reach the saAmfSGSuRestartMax value before the end of the "service unit restart" probation period (the duration of the period is specified by saAmfSGSuRestartProb), the "service unit restart" probation period for the affected service unit expires. If this count reaches the saAmfSGSuRestartMax value before the end of the "service unit restart" probation period, the Availability Management Framework performs the second level of recovery escalation for that service unit: the Availability Management Framework fails over the entire service unit and terminates the "service unit restart" probation period. Container and contained components will follow the same recovery actions as described above with the following difference: when a container component is restarted, this recovery action triggers the restart of its associated contained components (see Section 3.12.1.3.1). The count of restarted components of the service unit during the saAmfSGCompRestartProb probation period is not increased when contained components are restarted as a consequence of the restart of the associated container component. Similarly, the count of restarts of the service unit during the saAmfSGSuRestartProb probation period is not increased when the service unit containing the contained components is restarted as a consequence of the restart of the associated container component.
Note:
10
15
20
The first-level escalation of the SA_AMF_CONTAINER_RESTART recommended recovery action requested for a contained component is the restart of the service unit containing the associated container component, which triggers the restart of the service units containing contained components associated with the container component. The second-level escalation of the SA_AMF_CONTAINER_RESTART recommended recovery action requested for a contained component is the fail-over of the service unit containing the associated container component, which triggers the fail-over of the service units containing contained components associated with the container component.
25
30
Regarding the order of recovery operations of service instances protected by service groups containing container components and the order of recovery operations of service instances protected by service groups containing associated contained components when failing over a container component, refer to the recommendation in Section 6.3.
35
40
AIS Specification
173
Service AvailabilityTM Application Interface Specification System Description and System Model
If the Availability Management Framework fails over too many service units out of the same node in a given time period as a consequence of error reports with either SA_AMF_COMPONENT_RESTART or SA_AMF_COMPONENT_FAILOVER recommended recovery actions, the Availability Management Framework escalates the recovery to an entire node fail-over. The Availability Management Framework maintains the following configuration parameters on a per-node basis, which are used to implement escalations of level 3.
x x
10
saAmfNodeSuFailOverProb saAmfNodeSuFailoverMax 15
These attributes are defined in the SaAmfNode object class (see Section 8.7). The escalation algorithm of level 3 is very similar to the algorithm applied for levels 1 and 2. The escalation policy algorithm for an escalation of level 3 starts when the Availability Management Framework performs a service unit fail-over as a consequence of an escalation of level 2 or of an error report with an SA_AMF_COMPONENT_FAILOVER recommended recovery action on a node that is not already in the middle of a service unit fail-over probation period. At this time, the Availability Management Framework considers that it is at the beginning of a new service unit fail-over probation period for that node. The Availability Management Framework starts counting the number of service unit fail-overs it has to perform on that node as a consequence of an escalation of level 2 or an error report with an SA_AMF_COMPONENT_FAILOVER recommended recovery action. If this count does not reach the saAmfNodeSuFailoverMax value before the end of the service unit fail-over probation period (the length of the period is specified by saAmfNodeSuFailOverProb), the service unit fail-over probation period is terminated for all service units of the affected node. If this count reaches the saAmfNodeSuFailoverMax value before the end of the service unit fail-over probation period, the Availability Management Framework performs the third level of recovery escalation for the node: the Availability Management Framework fails over the entire node.
20
25
30
35
40
174
AIS Specification
10
15
the pathname of the CLC-CLI command (see Section 4.2), the list of environment variables (see Section 4.3) and arguments (see Section 4.4) to be provided to the CLC-CLI by the Availability Management Framework at runtime, and a timeout value used to control the execution of the CLC-CLI (refer to the sections describing each CLC-CLI command). The Availability Management Framework considers that the CLC-CLI failed if it did not complete in the time interval specified by this timeout.
20
25
Additional information on CLC-CLI configuration attributes is provided in Chapter 8 and [5]. CLC-CLIs are idempotents.
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 4
175
10
15
20
25
30
35
Using the saAmfCompType attribute of the component, it looks up the component's type described by an SaAmfCompType object. The component type object specifies
x x
the required software bundle in the saAmfCtSwBundle attribute and the relative paths to the CLI-CLI commands through the SaAmfCtRelPath<commandName>Cmd attributes.
40
176
AIS Specification
2.
For the selected AMF node, it looks up the pathname prefix for the software bundle (saAmfCtSwBundle), as given in the saAmfNodeSwBundlePathPrefix attribute of the appropriate SaAmfNodeSwBundle association object. It concatenates this pathname prefix of the software bundle (saAmfNodeSwBundlePathPrefix) with the per-command relative pathname of the component type (saAmfCtRelPath<commandName>Cmd) to compose the absolute pathname for the CLC-CLI command.
3.
10
15
20
first, the unicode characters are translated into UTF-8 encoding, as described in RFC 2253 ([9]), to obtain a character string; then, the quoted-printable encoding from RFC 2045 ([10]) is used to substitute non-printable characters in the string.
25
The saAmfCompCmdEnv configuration attribute of the SaAmfComp object class (shown in Section 8.13.2) defines environment variables and their values for all CLCCLIs commands of the component. These environment variables are added to the environment variables specified for components of this type (see the saAmfCtDefCmdEnv configuration attribute defined in the SaAmfCompType object class, shown in Section 8.13.1). If the saAmfCompCmdEnv attribute is not specified, only the environment variables (if any) and their values in the saAmfCtDefCmdEnv attribute apply.
30
35
40
AIS Specification
177
10
15
40
178
AIS Specification
10
15
20
25
30
35
40
AIS Specification
179
saAmfCompNumMaxInstantiateWithDelay, and saAmfCompDelayBetweenInstantiateAttempts respectively. An attempt to reinstantiate the component fails if the INSTANTIATE command returns an error or fails to complete in the configured timeout period (specified by the saAmfCompInstantiateTimeout configuration attribute of the SaAmfComp object class, shown in Section 8.13.2). If all these attempts fail, the Availability Management Framework has the possibility to force a node failfast recovery action. This possibility is controlled by the saAmfNodeFailfastOnInstantiationFailure configuration attribute of the SaAmfNode object class, shown in Section 8.7. The node failfast includes an implicit node reboot, which places all local components of the node (including its hardware components) into the uninstantiated presence state. For more details, see Section 3.12.1.3. If node reboot is disabled, or if a single reboot did not solve the problem, the Availability Management Framework sets the operational state of the component to disabled and its presence state to instantiation-failed. The presence state of the enclosing service unit becomes also instantiation-failed (it may also become termination-failed if other components of the service units failed to terminate successfully; note that the termination-failed state overrides the instantiation-failed state in this case). The Availability Management Framework performs a service unit level recovery action if the error occurred when some service instances were already assigned or being assigned to the service unit; however, no further automatic repair for this service unit beyond the already attempted node reboot is provided, and an explicit administration action is required to repair it. The following error code is recognized by the Availability Management Framework: SAF_CLC_NO_RETRY (200): the error that occurred when attempting to instantiate this component is persistent, and no retries or node reboot should be attempted.
10
15
20
25
30
35
40
180
AIS Specification
10
15
20
25
30
35
40
AIS Specification
181
10
15
20
25
30
35
40
182
AIS Specification
Framework for that service unit, and an explicit administration action is required to repair it. If the component was assigned the active HA state for some component service instances when the CLEANUP command was executed, and semantics of the redundancy model of its enclosing service group guarantee that at a point in time only one component can be in the active HA state for a given component service instance, the failure to terminate that component prevents the Availability Management Framework from assigning to another component the active HA state for these component service instances (and by the same token prevents the assignment of the active HA state to other service units for the service instances that contain the involved CSIs). In this case, the service instances will stay unassigned until an administrative action is performed to terminate the failed component.
10
15
20
25
30
35
AIS Specification
183
The AM_STOP command is mandatory for components that have an AM_START command, and must not be used for components that do not have an AM_START command. If the AM_STOP command returns an error or fails to complete in the configured timeout period (specified by the saAmfCompAmStopTimeout configuration attribute of the SaAmfComp object class, shown in Section 8.13.2), the Availability Management Framework will retry a few times to stop the monitor. If AM_STOP is invoked in the context of a component termination, and if AM_STOP still fails after all retries (the maximum number of retries is given by the saAmfCompNumMaxAmStopAttempts configuration attribute of the SaAmfComp object class), the Availability Management Framework terminates the component and cleans it up to ensure that the monitor eventually gets stopped. If AM_STOP fails while the Availability Management Framework tries to terminate a component in the context of a recovery action, the Availability Management Framework may skip the retries and go ahead immediately by terminating the component.
10
15
30 local components 35
40
184
AIS Specification
10
A single proxy component can mediate between the Availability Management Framework and multiple proxied components. Although the proxied/proxy approach is recommended when the proxied components are not located on AMF nodes, it can also be applied when the proxied components are contained in local service units. Pre-instantiable proxied components cannot be located in the same service unit as proxy components. This assumption is devised to prevent potential cyclic dependencies when service units are instantiated. If the proxy and proxied local components are hosted in different service units, these service units may reside on different AMF nodes. The configuration of proxy/proxied components must include information about the association of a proxied component to the CSI through which the proxied component will be proxied (termed proxy CSI). In other words, a proxied component configuration has a configuration attribute (saAmfCompProxyCsi in the SaAmfComp object class, shown in Section 8.13.2), which contains the name of the CSI through which the proxied component will be proxied. A proxy CSI can be dedicated to proxy one or more proxied components. A proxy component can be configured to accept multiple CSIs; one or more for proxying proxied components and others for providing non-proxy services. Note that in terms of function there is no difference between a proxy CSIs and other CSIs. The proxy CSI corresponds to the workload of proxying a proxied component. Only the proxy component with the active HA assignment for a proxy CSI may register the proxied components associated with the CSI. The redundancy model (for a discussion of this notion, refer to Section 3.7) of the proxy component can be different from redundancy models of its proxied components.
15
20
25
30
x x
35
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 5
185
10
15
20
25
30
35
40
186
AIS Specification
10
15
20
25
30
35
40
AIS Specification
187
10
15
20
25
30
35
40
188
AIS Specification
15
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 6
189
ries in a service unit is to be able to restart the associated container component as a recovery action to handle failures in associated contained components. As any service unit of a service group can be assigned any service instance protected by the service group (see Section 3.2.6), and as it makes no sense to assign the container CSI configured for the contained components of a service unit to these contained components, service units containing contained components and service units containing container components must belong to different service groups. Section 6.1.6 discusses the redundancy models that are supported for these service groups. The readiness state of a service unit containing contained components is affected by the HA state of the associated container component for their container CSI, as explained in Section 3.3.1.4.
10
15 AMF nodes and node groups configured for service units containing contained components (or for the corresponding service groups) must not conflict with the AMF nodes and node groups configured for the service units containing container components that can potentially drive the life cycle of these contained components (or for the service groups containing these service units). This is because a container component and its associated contained components must reside on the same AMF node. A service instance containing a container CSI should not contain any other component service instance. 6.1.6 Redundancy Models For a discussion of redundancy models, refer to Section 3.7. The redundancy model of service groups having service units containing a container component can be different from the redundancy model of service groups having service units containing the associated contained components. The only redundancy model supported for service groups having service units containing a container component is the N-Way active redundancy model. The reason for this decision is that only container components with the active HA assignment for a container CSI may handle the life cycle of contained components in cooperation with the Availability Management Framework (that is, may act as associated containers). A service group containing contained components can be associated with any of the redundancy models defined by the Availability Management Framework. 30
20
25
35
40
190
AIS Specification
6.1.7 Administrative Operations and Container and Contained Components The description of the Availability Management Framework administrative operations is presented in Chapter 9. The peculiarities of these operations when container components and container CSIs are affected are described in Section 9.4.3 for the lock operation, Section 9.4.6 for the shutdown operation, Section 9.4.7 for the restart operation, and Section 9.4.8 for the SI swap operation. Regarding the order of recovery operations for service instances protected by service groups containing container components and the order of recovery operations for service instances protected by service groups containing associated contained components, when certain administrative operations affect the container components, refer to the recommendation in Section 6.3. 6.1.8 Failure Handling Refer to Section 6.3.
10
15
20
25
30
35
40
AIS Specification
191
10
15
20
25
30
35
40
192
AIS Specification
Configure all components of SU1 with the container CSI ContainerCSI1. Configure similarly all components of SU2 with ContainerCSI2 and all components of SU3 with ContainerCSI3. Create three service groups ContainerSG1, ContainerSG2, and ContainerSG3, which have each one service unit on this node, termed ContainerSU1, ContainerSU2, and ContainerSU3 respectively. These service units contain the container components ContainerC1, ContainerC2, and ContainerC3 respectively. Configure the following CSI assignments: ContainerCSI1 to ContainerC1, ContainerCSI2 to ContainerC2, and ContainerCSI3 to ContainerC3.
10
If there are multiple container components on a node which have the active HA state for a particular container CSI, and one or more service units on the same node whose contained components are configured with the same container CSI, it is implementation-defined how the Availability Management Framework selects container components to handle the life cycle of the contained components of these service units. However, all contained components of a service unit must have the same associated container component. Note that a container component can be configured to have multiple CSI assignments, one or more for handling contained components (container CSI) and others for providing other services. In terms of functionality and syntax, there is no difference between a container CSI used to determine the associated container component and CSI assignments corresponding to the workload of other services. Actions taken by the Availability Management Framework when it changes or removes the HA state of a container component for a container CSI are described on page 65. 6.2.3 Life Cycle Callbacks Two callback functions are invoked by the Availability Management Framework to control the life cycle of contained components.
x
15
20
25
30
35
The Availability Management Framework invokes the SaAmfComponentTerminateCallbackT callback function (see Section 7.10.1) directly on the associated contained component.
40
AIS Specification
193
10
15
20
25
30
35
40
194
AIS Specification
Error Containment: if a container component fails, the Availability Management Framework assumes that the termination of the container component also forces the termination of all associated contained components. The termination of a proxy component does not imply the termination of its proxied components. SA-Aware components: though the life cycle of a contained component is handled by the associated container in cooperation with the Availability Management Framework, a contained component is an SA-aware component, and it communicates directly with the Availability Management Framework. A proxied component is a non-SA-aware component, and it communicates with the Availability Management Framework only through its proxy component. Recommended recovery: contained components can specify the SA_AMF_CONTAINER_RESTART as a recommended recovery, which is a convenience method for contained components to recommend the component restart of their container. A counterpart for proxy components does not exist. Limitations in the service unit configuration: all contained components in a service unit need to be configured with the same container CSI. It is not permitted to configure contained components and non-contained components in the same service unit. A non-pre-instantiable proxied component and its proxy component can be located in the same service unit. A pre-instantiable proxied component and its proxy component must not be located in the same service unit. Local Components: both container and contained components are local components. A proxied component can be either a local or an external component. Registration: container and contained components register directly with the Availability Management Framework. Only the proxy component with the active HA assignment for a proxy CSI may register proxied components associated with the proxy CSI.
10
15
20
25
30
35
40
AIS Specification
195
10
15
20
25
30
35
40
196
AIS Specification
Library life cycle Component registration and unregistration Passive monitoring of processes of a component Component health monitoring Component service instance management Component life cycle Protection group management Error reporting Component response to Availability Management Framework requests 15 10
A component exists in a single service unit, and it typically consists of one or more processes executing on a node. It is the responsibility of the component to monitor and isolate faults within its scope and to generate error reports accordingly. As a function of these error reports, cluster membership changes, health monitor reports, and administrative operations, the Availability Management Framework manages internally the readiness state of the affected components. The Availability Management Framework drives the HA state of components on behalf of component service instances to provide service availability. The function calls described in this chapter cover only the interactions between an SA-aware or a proxied component (through its proxy component) and the Availability Management Framework, and it does not cover operational or administrative aspects. Consequently, the logical entities that are represented in the parameters of the calls are limited to:
x x x x x
20
25
30
SA-aware components Proxy components Proxied components (local or external) Component service instances Protection groups
35
The other logical entities, such as service units, service groups (including their redundancy model), and service instances are used when configuring the relationships
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 7
197
among the components which must be maintained by the Availability Management Framework.
10
15
20
25
the unregistration is done for the component to perform diagnosis and repair on its own if the component is no longer able to provide service, possibly due to a fault, or it is explicitly instructed by the Availability Management Framework to be terminated or restarted. In this case, the Availability Management Framework does not disable the component. 30
What has been said in this section is only applicable to non-proxied components, because if a proxy component unregisters one of its proxied components, the component service instances for the proxied components are not removed but rather kept assigned. Moreover, the operational and HA states of the proxied components do not change. These decisions are motivated by the assumption that another proxy will soon take over the role of the previous proxy. A proxy component must first register itself and then register one or more proxied components on their behalf.
35
40
198
AIS Specification
A process that is part of a proxy component and that registers several proxied components, may issue several calls to the saAmfInitialize_3() function to provide different sets of callback functions and obtain different handles that can be used to register the various proxied components. The process of an SA-aware component COMP that registers a component (SA-aware or proxied) is called the registered process for this component, and any other process of the component COMP is termed an unregistered process for this component. Some of the callback functions are called by the Availability Management Framework only in the context of the registered process. Additionally, there are other API functions that may be called only by a registered process. If an API function may be called only by a registered process, or if a callback function may be invoked only for the registered process, the descriptions of the APIs of the Availability Management Framework state this fact explicitly. Additionally, Appendix B provides a table showing which API functions may be invoked by unregistered processes and which API callback functions may be invoked for such processes. When the Availability Management Framework issues a request to a particular component, it triggers the invocation of a callback function. Some of the callback calls require a response from the component. In these cases, the component invokes the saAmfResponse() function (described in Section 7.13.1 on page 285) when it has successfully completed the action or has failed to perform the action. More precisely, the following principles are applied in the Availability Management Framework/component interactions:
x
10
15
20
25
The process is not required to complete the action requested by the Availability Management Framework within the invocation of the callback function. It may return from the callback function and complete the action later. The process is expected to notify the completion of the action (or any error that prevented it from performing the action) by invoking the saAmfResponse() function. The saAmfResponse() function must identify the callback action with which it is associated by providing the value of the invocation parameter that the Availability Management Framework supplied in the callback. Any function of the Availability Management Framework API, including saAmfResponse(), can be invoked from callback functions. 30
35
40
AIS Specification
199
A component (or more specifically each of its processes) is allowed to dynamically start and stop a specific healthcheck. Each healthcheck has an identification (healthcheck key) that is associated with a set of configuration attributes. Healthchecks can be invoked by the Availability Management Framework or by the component. The issue of potential transient overload caused by healthcheck invocations is not considered in this proposal. Overload is a global issue, and should be handled in a consistent global level; it will be considered in a future version of the AIS specification.
7.1.2.2 Variants of Healthchecks
10
15
There are two variants of healthcheck, depending on the invoker of the healthcheck:
x
Framework-invoked healthcheck: for this variant of healthcheck, the Availability Management Framework invokes the saAmfHealthcheckCallback() callback periodically according to the healthcheck configuration attributes, described in Section 7.1.2.4. The Availability Management Framework expects the component to reply to an invoked healthcheck by calling saAmfResponse(). Component-invoked healthcheck: this variant of healthcheck is invoked by the component itself (according to its configured parameters, see Section 7.1.2.4), and the component reports the result of the healthcheck to the Availability Management Framework by calling saAmfHealthcheckConfirm().
20
25
30
Healthchecks are started when a process invokes the saAmfHealthcheckStart() function; they are stopped when a process invokes the saAmfHealthcheckStop() function. There is no default healthcheck that is invoked by the Availability Management Framework without an explicit start request by the component. Multiple processes of a component can start healthcheck and each one can decide which healthcheck should be performed. Moreover, when a process starts a healthcheck, it can also specify the recommended recovery action to be applied by the Availability Management Framework when it reports an error on the component if its
35
40
200
AIS Specification
healthcheck reports to the Availability Management Framework are not made in a timely manner. The start of healthchecks is independent from the component registration, that is, it is possible to start healthchecks before the component is registered or after the component is unregistered.
7.1.2.4 Healthcheck Configuration Issues
The Availability Management Framework supports the notion of a healthcheck type. A number of healthcheck types can be defined for a component type, and each healthcheck type is identified by a healthcheck key. A healthcheck type configuration must be provided for each healthcheck key that the component uses to start a healthcheck. All components of the same type share the healthcheck attribute values defined in the healthcheck type configuration. The healthcheck attributes provided in the healthcheck type for a certain healthcheck key can be overridden in the healthcheck configuration for the component for the same healthcheck key. The healthcheck configuration for the component can only specify healthcheck keys for which there is a healthcheck type configuration for its component type. Further details on the healthcheck configuration are presented in Chapter 8 and [5]. The Availability Management Framework retrieves the healthcheck configuration based on the healthcheck key referred to by the healthcheckKey parameter of the healthcheck API calls. The scope of the healthcheck key is limited to the component and is not cluster-wide. It is assumed that the component configuration retrieved by the Availability Management Framework has passed a series of sanity checks and validations before the cluster startup. Hence, this rules out errors like specifying too frequent healthcheck in the configuration. Thus, based on these validations, the Availability Management Framework may reject a healthcheck start request only if some of the given parameters, such as component name or healthcheckKey, are invalid. A healthcheck configuration comprises two attributes:
x
10
15
20
25
30
35 period: this attribute indicates the period at which the corresponding healthcheck should be initiated. This attribute is defined for both framework-invoked and component-invoked healthchecks; however, it has different meanings for these two variants of healthchecks, as will be explained next (the actual name of this configuration attribute is saAmfHealthcheckPeriod, and it is defined in the SaAmfHealthcheck configuration object class, shown in Section 8.14).
40
AIS Specification
201
maximum-duration: this attribute indicates the time-limit after which the Availability Management Framework will report an error on the component if no response for a healthcheck is received by the Availability Management Framework in this time frame. This attribute applies only to the framework-invoked healthcheck variant (the actual name of this configuration attribute is saAmfHealthcheckMaxDuration, and it is defined in the SaAmfHealthcheck configuration object class, shown in Section 8.14).
The component developer is aware of the healthcheck variant supported by a component, and the component developer specifies this healthcheck variant in the corresponding healthcheck API calls.
7.1.2.4.1 Role of Period and Maximum-Duration in Framework-Invoked Healthchecks
x
10
period: for a given framework-invoked healthcheck started by a process and for every "period", the Availability Management Framework will invoke the corresponding healthcheck callback; however, if the process does not respond to a given healthcheck callback before the start of the next healthcheck period, the Availability Management Framework will not trigger the next invocation of the healthcheck callback until the response to the previous invocation is received. In other words, at any given time and for each healthcheck, there is at most one callback invocation for which the response is pending. Of course, as a process may have started several healthchecks in parallel, the Availability Management Framework will invoke callbacks for these different healthchecks independently. In the next bullet, it is described what happens when a process does not respond timely to a framework-invoked healthcheck. maximum-duration: to correctly specify the value for the period of a healthcheck, the deployer has to make sure that the period is set larger than the average duration of the interval between the Availability Management Framework triggering a callback invocation and receiving the corresponding response. This setting guarantees that in normal conditions, with expected load, the response of a healthy process for the invoked healthcheck callbacks will arrive at the Availability Management Framework timely (and before the Availability Management Framework attempts to issue another callback for the same healthcheck). However, it may not be very easy for the deployer to estimate the expected normal condition and load on the cluster. Therefore, the Availability Management Framework should wait somewhat longer than this average time before concluding that the process is unable to respond to the healthcheck. The maximum-duration attribute is defined for such a purpose: the Availability Management Framework will wait for maximum-duration to receive a response from the process (component) for a given callback invocation. The deployer should allow enough slack in the maximum-duration attribute, so that the response of the healthy process (component) will definitely arrive at the
15
20
25
30
35
40
202
AIS Specification
Availability Management Framework before maximum-duration expires, even in presence of situations such as high-load on the network and/or high-load on the processing resources of nodes in the cluster. In short, one has to consider the following trade-off in defining values for period and maximum-duration for the framework-invoked healthchecks:
x
period: this value should be set as short as possible, but it should be larger than the average time-duration accounted for the arrival of the corresponding reply to the Availability Management Framework. If period is set too short, the Availability Management Framework may consider the component of a healthy process that runs in highly load environment as faulty. On the other hand, if period is set too large, the process may be checked too sparsely, and thus the latency in detecting process (component) failures (mostly latent fault detection) becomes larger. maximum-duration: as discussed earlier, maximum-duration should be larger than the average time-duration accounted for the response of a process to a callback invocation. The maximum-duration attribute should also include enough slack time, so that, even in the presence of anomalies other than component failures, the healthcheck response arrives at the Availability Management Framework before maximum-duration expires. If maximum-duration is set too short, it is possible that a healthy process (component) has not been given enough time to respond to the healthcheck. In this case, the Availability Management Framework will falsely assume that the component is faulty. On the other hand, if maximum-duration is set too large, the latency for the detection of a faulty component being healthchecked may be increased.
10
15
20
25
As already explained, component-invoked healthchecks do not have the maximum-duration attribute (if it is provided, it will be ignored by the Availability Management Framework). When a process informs the Availability Management Framework of its intention of starting a component-invoked healthcheck (by calling saAmfHealthcheckStart()), the Availability Management Framework expects that the process invokes periodically saAmfHealthcheckConfirm() no later than at the end of every period. More specifically, the Availability Management Framework reports an error on the component if it does not receive a healthcheck confirmation from the component before the end of every period. The recommended recovery for this error is specified by the process when it invoked the saAmfHealthcheckStart() call. The deployer should add enough slack time to period, so that the healthcheck invoked by a healthy process can reach the Availability Management Framework on time.
30
35
40
AIS Specification
203
Modifications of the period and maximum-duration healthcheck attributes for Framework-invoked healthchecks in the Availability Management Framework configuration take place immediately. In contrast, modifications of the period healthcheck attribute for component-invoked healthchecks in the Availability Management Framework configuration will be effective the next time the healthcheck is started by invoking the saAmfHealthcheckStart() function. 7.1.3 Component Service Instance Management The basic concepts have been explained in Chapter 3. Administrative, operational, and presence states are managed by the Availability Management Framework, but they are not exposed to the components. The readiness state of a component is a private state managed by the Availability Management Framework. It is neither exposed to components nor to system management, and it is solely used to determine the eligibility of components to receive component service instance assignments. The APIs exposed by the availability management are limited to the management of the HA state for components. The Availability Management Framework uses callbacks to request components to:
x
10
15
20
add or remove component service instances from components that are in the inservice state and to change the HA state of a component on behalf of a component service instance (active, standby, quiescing, quiesced).
25
The Availability Management Framework enforces that there are no overlapping requests to set the state of a component at any specific time. Two state change requests are said to overlap if the Availability Management Framework requests a component to enter the new state, before the component acknowledges the first request, which is done when the component invokes the saAmfResponse() API function, as described in Section 7.13.1. The rationale for avoiding overlapping requests is that it is simpler to program a component when overlapping requests are prohibited than when the component must check and report such overlapping. Component service instances can be assigned to a component only if the component is in the in-service state. For details, refer to the readiness state in Section 3.3.2.3 and to the HA state in Section 3.3.2.4. The component service instance management comprises data structures and APIs. The API functions are described in Section 7.9 on page 255.
30
35
40
204
AIS Specification
7.1.4 Component Life Cycle Management The API functions of the component life cycle management are described in Section 7.10 on page 264. They comprise the callback function to request a component to terminate and the callback functions that proxy and container components export to enable the Availability Management Framework to manage proxied and contained components. 7.1.5 Protection Group Management The basic concepts have been explained in Chapter 3. For the API functions, refer to Section 7.11 on page 272. 7.1.6 Error Reporting For the API interfaces, refer to Section 7.12 on page 281. 7.1.7 Component Response to Framework Requests For the API interfaces, refer to Section 7.13 on page 285. 7.1.8 API Usage Illustrations This section illustrates the usage of the Availability Management Framework API by different categories of components. FIGURE 23 next shows an example of an SA-aware component consisting of a single process. The numbers in circles indicate the sequence of events in time.
FIGURE 23 SA-Aware Component Consisting of a Single Process
10
15
20
25
SA-aware Component
Application Code
1 2
Registered Process Save the component name to pass 3 it to API calls requiring them. Local Component Name
4 5 6
30
35
AMF Library saAmfInitialize_3() saAmfComponentNameGet state change callbacks API calls saAmfComponentRegister
40
AMF manages the life cycle of the component
AIS Specification
205
FIGURE 24 next shows an example of an SA-aware component consisting of multiple processes. The numbers in circles indicate the sequence of events in time.
FIGURE 24 SA-Aware Component Consisting of Multiple Processes
SA-aware Component
Unregistered Process C Application Code
9 7 8
Local Component Name Save the component 10 name to pass it to API calls requiring it.
Local Component Name Save the component 14 name to pass it to API calls requiring it.
10
API calls
15
API calls
20
Application Code
1 2
25
AMF Library
saAmfInitialize_3 saAmfComponentNameGet
30
AMF manages the life cycle of the component
35
40
206
AIS Specification
FIGURE 25 next shows an example of a single-process proxy component that registers itself and two proxied components with the Availability Management Framework. The numbers in circles indicate the sequence of events in time.
FIGURE 25 A Single-Process Proxy Component and Two Proxied Components
10
Proxy Component
15
20
Register all three components
4 5 6
Save the proxy component name to pass it to API calls requiring it.
25
30
35
40
AIS Specification
207
10
15
20
25
30
35
An Availability Management Framework API function is called by a process nearly at the same time when the node exits the cluster and the Availability Management Framework area server on the node has not yet terminated or cleaned up the component process.
40
208
AIS Specification
The Availability Management Framework encounters an error when attempting to terminate or cleanup all of the processes associated with components, so there may still be component processes running. The cleanup operation of a component (see Table 34 in Appendix A) does not properly clean up all of the processes associated with the component. A process using the Availability Management Framework API, but which is not registered (or not yet registered) as a component, is running on the AMF node, and since the Availability Management Framework has no knowledge of the process, it will not attempt to clean it up.
10
In the few special situations described above, the Availability Management Framework behaves as follows towards processes residing on that node and using or attempting to use the service:
x x
Calls to saAmfInitialize_3() will fail with SA_AIS_ERR_UNAVAILABLE. All Availability Management Framework APIs that are invoked by the process and that operate on handles already acquired by the process will fail with SA_AIS_ERR_UNAVAILABLE with the exception of saAmfFinalize(), which is used to free the library handles and all resources associated with these handles. Any outstanding SaAmfProtectionGroupTrackCallbackT callback will return SA_AIS_ERR_UNAVAILABLE in the error parameter. No other callbacks will be called.
15
20
If the node rejoins the cluster membership, the Availability Management Framework instantiates service units on this node based on the configuration of the service groups that contain service units hosted by that node. Processes belonging to components of these service units can access the Availability Management Framework API functions without restrictions. However, the left-over processes of the few special situations above will still be denied service as explained. When the node leaves the membership, the Availability Management Framework executing on the remaining nodes of the cluster behaves as if all processes belonging to components residing on the leaving node had been terminated. As AMF engages procedures to terminate all components on the leaving node, AMF sets the presence state of all components and all service units on this node to uninstantiated. The readiness state of all service units and all components on this node is set to out-of-service.
25
30
35
40
AIS Specification
209
7.2.2 Guidelines for Availability Management Framework Implementers The implementation of the Availability Management Framework must leverage the SA Forum Cluster Membership Service (see [3]) to determine the membership status of a node. If the Cluster Membership Service considers a node as a member of the cluster but the Availability Management Framework experiences difficulty in providing service to its clients because of transport, communication, or other issues, it must respond to the API calls invoked by a process with SA_AIS_ERR_TRY_AGAIN.
10
15
20
25
30
35
40
210
AIS Specification
and
#include <saNtf.h>
10
The latter statement is needed for the functions saAmfComponentErrorReport() and saAmfComponentErrorClear(). To use the Availability Management Framework API, an application must be bound with the following library:
libSaAmf.so
15
20
25
A process acquires this handle to the Availability Management Framework by invoking the saAmfInitialize_3() function and uses it in subsequent invocations of the functions of the Availability Management Framework. 7.4.2 Component Process Monitoring This section describes the data types that the Availability Management Framework requires for the passive monitoring of processes of a component.
7.4.2.1 SaAmfPmErrorsT Type #define SA_AMF_PM_ZERO_EXIT #define SA_AMF_PM_NON_ZERO_EXIT #define SA_AMF_PM_ABNORMAL_END typedef SaUint32T SaAmfPmErrorsT; 0x1 0x2 0x4
30
35
40
AIS Specification
211
For the explanation of the enum values in SaAmfPmStopQualifierT, refer to Section 7.7.2 on page 244. 7.4.3 Component Healthcheck Monitoring
7.4.3.1 SaAmfHealthcheckInvocationT typedef enum { SA_AMF_HEALTHCHECK_AMF_INVOKED SA_AMF_HEALTHCHECK_COMPONENT_INVOKED } SaAmfHealthcheckInvocationT; = 1, = 2
10
15
20
SA_AMF_HEALTHCHECK_AMF_INVOKED - The healthchecks are invoked by the Availability Management Framework. SA_AMF_HEALTHCHECK_COMPONENT_INVOKED - The healthchecks are invoked by the component. 25
7.4.3.2 SaAmfHealthcheckKeyT #define SA_AMF_HEALTHCHECK_KEY_MAX 32 typedef struct { SaUint8T key[SA_AMF_HEALTHCHECK_KEY_MAX]; SaUint16T keyLen; } SaAmfHealthcheckKeyT;
30
35
40
212
AIS Specification
10
15
20
25
30
35
The presence state is uninstantiated, instantiating, instantiated, terminating, restarting, instantiation-failed, or termination-failed.
40
AIS Specification
213
10
15
20
25
30
35
The proxy status of a component is proxied (SA_AMF_PROXY_STATUS_PROXIED) or unproxied (SA_AMF_PROXY_STATUS_UNPROXIED). If the proxy status is SA_AMF_PROXY_STATUS_PROXIED, a proxy component is currently proxying the component. If the proxy status is SA_AMF_PROXY_STATUS_UNPROXIED, no proxy
40
214
AIS Specification
component is currently assigned to proxy the component, possibly because the previous proxy component failed, and the Availability Management Framework could not engage another component to assume the mediation responsibility for the component.
7.4.4.8 All Defined States typedef enum { SA_AMF_READINESS_STATE SA_AMF_HA_STATE SA_AMF_PRESENCE_STATE SA_AMF_OP_STATE SA_AMF_ADMIN_STATE SA_AMF_ASSIGNMENT_STATE SA_AMF_PROXY_STATUS } SaAmfStateT; = 1, = 2, = 3, = 4, = 5, = 6, = 7
10
15
This enum defines all states (readiness, HA state, presence, operational, administrative, and assignment) and the additional proxy status. 7.4.5 Component Service Types
7.4.5.1 SaAmfCSIFlagsT #define SA_AMF_CSI_ADD_ONE #define SA_AMF_CSI_TARGET_ONE #define SA_AMF_CSI_TARGET_ALL 0X1 0X2 0X4
20
25
30
SA_AMF_CSI_ADD_ONE - A new component service instance is assigned to the component. The component is requested to assume a particular HA state for the new component service instance. SA_AMF_CSI_TARGET_ONE - The request made to the component targets only one of its component service instances. SA_AMF_CSI_TARGET_ALL - The request made to the component targets all of its component service instances. This flag is used for cases in which all component service instances are managed as a bundle: the component is assigned the same HA state for all component service instances at the same time, or all component service instances are removed at the same time. For
35
40
AIS Specification
215
assignments, this flag is set for components providing the 'x_active_or_y_standby' capability model. The Availability Management Framework can use this flag in other cases for removing all component service instances at once, if it makes sense. These values are mutually exclusive. Only one value can be set in SaAmfCSIFlagsT.
7.4.5.2 SaAmfCSITransitionDescriptorT typedef enum { SA_AMF_CSI_NEW_ASSIGN SA_AMF_CSI_QUIESCED SA_AMF_CSI_NOT_QUIESCED SA_AMF_CSI_STILL_ACTIVE } SaAmfCSITransitionDescriptorT; = 1, = 2, = 3, = 4
10
15
This enumeration type provides information on the component that was or still is active for the specified component service instance. The values of the SaAmfCSITransitionDescriptorT enumeration type have the following interpretation:
x
20
SA_AMF_CSI_NEW_ASSIGN - This assignment is not the result of a switchover or fail-over of the specified component service instance from another component to this component. No component was previously active for this component service instance. SA_AMF_CSI_QUIESCED - This assignment is the result of a switch-over of the specified component service instance from another component to this component. The component that was previously active for this component service instance has been quiesced. SA_AMF_CSI_NOT_QUIESCED - This assignment is the result of a fail-over of the specified component service instance from another component to this component. The component that was previously active for this component service instance has not been quiesced. SA_AMF_CSI_STILL_ACTIVE - This assignment is not the result of a switchover or fail-over of the specified component service instance from another component to this component. At least one other component is still active for this component service instance. This flag is used, for example, in the N-way active redundancy model when a new component is assigned active for a component service instance while other components are already assigned active for that component service instance.
25
30
35
40
216
AIS Specification
transitionDescriptor - This descriptor provides information on the component that was or is still active for the one or all of the specified component service instances (see previous section). activeCompName - The name of the component that was previously active for the specified component service instance.
10
15
When a component is requested to assume the active HA state for one or for all component service instances assigned to the component, SaAmfCSIActiveDescriptorT holds the following information:
x
The Availability Management Framework uses the transitionDescriptor that is appropriate for the redundancy model of the service group to which this component belongs. If transitionDescriptor is set to SA_AMF_CSI_NOT_QUIESCED or SA_AMF_CSI_QUIESCED, activeCompName holds the name of the component that was previously assigned the active state for the component service instances and no longer has that assignment. If transitionDescriptor is set to SA_AMF_CSI_NEW_ASSIGN, activeCompName is not used. If transitionDescriptor is set to SA_AMF_CSI_STILL_ACTIVE, activeCompName holds the name of one of the components that are still assigned the active HA state for all targeted component service instances. Any of these components can be arbitrarily selected.
20
25
30
35
40
AIS Specification
217
activeCompName - Name of the component that is currently active for the one or all of the specified component service instances. This name is empty if no active component exists. standbyRank - Rank of the component for assignments of the standby HA state to the component for the one or all of the specified component service instances.
When a component is requested to assume the standby HA state for one or for all component service instances assigned to the component, SaAmfCSIStandbyDescriptorT holds in activeCompName the name of the component that is currently assigned the active state for the one or all these component service instances. In redundancy models in which several components may assume the standby HA state for the same component service instance at the same time, standbyRank indicates to the component the rank it must assume. When the Availability Management Framework selects a component to assume the active HA state for a component service instance, the component assuming the standby state for that component service instance with the lowest standbyRank value is chosen.
typedef union { SaAmfCSIActiveDescriptorT activeDescriptor; SaAmfCSIStandbyDescriptorT standbyDescriptor; } SaAmfCSIStateDescriptorT;
10
15
20
25
The SaAmfCSIStateDescriptorT holds additional information about the assignment of a component service instance to a component when the component is requested to assume the active or standby HA state for this component service instance.
7.4.5.4 SaAmfCSIAttributeListT typedef struct { SaUint8T *attrName; SaUint8T *attrValue; } SaAmfCSIAttributeT;
30
35
SaAmfCSIAttributeT represents a single component service instance attribute by its name and value strings. Each string consists of UTF-8 encoded characters and is terminated by the NULL character.
40
218
AIS Specification
SaAmfCSIAttributeListT represents the list of all attributes for a single component service instance. The attr pointer points to an array of number elements of SaAmfCSIAttributeT attribute descriptors.
7.4.5.5 SaAmfCSIDescriptorT typedef struct { SaAmfCSIFlagsT csiFlags; SaNameT csiName; SaAmfCSIStateDescriptorT csiStateDescriptor; SaAmfCSIAttributeListT csiAttr; } SaAmfCSIDescriptorT;
10
15
SaAmfCSIDescriptorT provides information about the component service instances targeted by the saAmfCSISetCallback() callback API. When SA_AMF_CSI_TARGET_ALL is set in csiFlags, csiName is not used; otherwise, csiName contains the name of the component service instance targeted by the callback. When SA_AMF_CSI_ADD_ONE is set in csiFlags, csiAttr refers to the attributes of the newly assigned component service instance; otherwise, no attributes are provided and csiAttr is not used. When the component is requested to assume the active or standby state for the targeted service instances, csiStateDescriptor holds additional information relative to that state transition; otherwise, csiStateDescriptor is not used.
20
25
30
35
40
AIS Specification
219
10
compName - The name of the component that is a member of the protection group. haState - The haState of the member component for the component service instance supported by the member component. rank - The rank of the member component in the protection group if haState is standby.
15
20
25
30
The values of the SaAmfProtectionGroupChangesT enumeration type have the following interpretation:
x
SA_AMF_PROTECTION_GROUP_NO_CHANGE - This value is used when the trackFlags parameter of the saAmfProtectionGroupTrack() function (as defined in Section 7.11.1) is either
SA_TRACK_CURRENT or SA_TRACK_CHANGES, and all the following conditions hold:
x
35
The member component was already a member of the protection group in the previous saAmfProtectionGroupTrackCallback() callback call.
40
220
AIS Specification
The component service instance has not been removed from the member component. Neither haState nor rank of the SaAmfProtectionGroupMemberT structure of this member component has changed.
SA_AMF_PROTECTION_GROUP_ADDED - The associated component service instance has been added to the member component. SA_AMF_PROTECTION_GROUP_REMOVED - The associated component service instance has been removed from the member component. SA_AMF_PROTECTION_GROUP_STATE_CHANGE - Any of the elements haState or rank of the SaAmfProtectionGroupMemberT structure for the member component have changed.
10
15
20
member - The information associated with the component member of the protection group. change - The kind of change in the associated component member.
25
7.4.6.4 SaAmfProtectionGroupNotificationBufferT
30
typedef struct { SaUint32T numberOfItems; SaAmfProtectionGroupNotificationT *notification; } SaAmfProtectionGroupNotificationBufferT;
35
numberOfItems - Number of elements of type SaAmfProtectionGroupNotificationT in the notification array. notification - Pointer to the notification array.
40
AIS Specification
221
7.4.7 SaAmfRecommendedRecoveryT
typedef enum { SA_AMF_NO_RECOMMENDATION SA_AMF_COMPONENT_RESTART SA_AMF_COMPONENT_FAILOVER SA_AMF_NODE_SWITCHOVER SA_AMF_NODE_FAILOVER SA_AMF_NODE_FAILFAST SA_AMF_CLUSTER_RESET SA_AMF_APPLICATION_RESTART SA_AMF_CONTAINER_RESTART } SaAmfRecommendedRecoveryT; = 1, = 2, = 3, = 4, = 5, = 6, = 7, = 8, = 9
10
15
A short explanation of the values of this enumeration is given next. Additional details are provided in Section 3.12.1.3 and subsections:
x
SA_AMF_NO_RECOMMENDATION - This report makes no recommendation for recovery. However, the Availability Management Framework should engage the configured per-component recovery policy (refer to Section 3.12.1.3) in such a scenario. SA_AMF_COMPONENT_RESTART - The erroneous component should be terminated and reinstantiated. SA_AMF_COMPONENT_FAILOVER - The error is related to the execution environment of the component on the current node. Depending on the redundancy model used, either the component or the service unit containing the component should fail over to another node. SA_AMF_NODE_SWITCHOVER - The error has been identified as being at the node level, and no service instance should be assigned to service units on that node. Service instances containing component service instances assigned to the failed component are failed over while other service instances are switched over to other nodes (component service instances are not abruptly removed; instead, they are brought to the quiesced state before being removed). SA_AMF_NODE_FAILOVER - The error has been identified as being at the node level, and no service instance should be assigned to service units on that node. All service instances assigned to service units contained in the node are failed over to other nodes (by an abrupt termination of all node-local components).
20
25
30
35
40
222
AIS Specification
SA_AMF_NODE_FAILFAST - The error has been identified as being at the node level, and components should not be in service on the node. The node should be rebooted using a low-level interface. SA_AMF_APPLICATION_RESTART - The application should be completely terminated and then started again by first terminating all of its service units and then starting them again, ensuring that during the termination phase of the restart procedure it is not required to reassign service instances (refer additionally to Section 9.4.7 on page 332). SA_AMF_CLUSTER_RESET - The cluster should be reset. In order to execute this function, the Availability Management Framework reboots all nodes that are part of the cluster by using a low level interface without trying to terminate the components individually. To be effective, this operation must be performed, so that all nodes are first halted before any of the nodes boots again. SA_AMF_CONTAINER_RESTART - Terminate all contained components and the container component abruptly and then instantiate them again.
10
15
7.4.8 saAmfCompCategoryT
#define SA_AMF_COMP_SA_AWARE #define SA_AMF_COMP_PROXY #define SA_AMF_COMP_PROXIED #define SA_AMF_COMP_LOCAL #define SA_AMF_COMP_CONTAINER #define SA_AMF_COMP_CONTAINED 0x0001 0x0002 0x0004 0x0008 0x0010 0x0020
20
25
Based on Table 3 on page 39, all possible ORing of values are shown in the following table:
30
35
40
AIS Specification
223
Table 19 Possible Combinations of Values in saAmfCompCategoryT Component regular SA-aware Mandatory Values SA_AMF_COMP_SA_AWARE. If the component is proxy, SA_AMF_COMP_PROXY must be additionally ORed. SA_AMF_COMP_CONTAINER. If the component is proxy, SA_AMF_COMP_PROXY must be additionally ORed. SA_AMF_COMP_CONTAINED SA_AMF_COMP_LOCAL and SA_AMF_COMP_PROXIED SA_AMF_COMP_LOCAL Optional Values SA_AMF_COMP_LOCAL
container
SA_AMF_COMP_LOCAL, SA_AMF_COMP_SA_AWARE
10
SA_AMF_COMP_LOCAL, SA_AMF_COMP_SA_AWARE -
15
20
7.4.9 saAmfRedundancyModelT 25
typedef enum { SA_AMF_2N_REDUNDANCY_MODEL SA_AMF_NPM_REDUNDANCY_MODEL SA_AMF_N-WAY_REDUNDANCY-MODEL SA_AMF_N_WAY_ACTIVE_REDUNDANCY_MODEL SA_AMF_NO_REDUNDANCY_MODEL } saAmfRedundancyModelT; = 1, = 2, = 3, = 4, = 5
30
For a description of the various redundancy models enumerated in this type, refer to Section 3.7 on page 87.
35
40
224
AIS Specification
7.4.10 saAmfCompCapabilityModelT
typedef enum { SA_AMF_COMP_X_ACTIVE_AND_Y_STANDBY SA_AMF_COMP_X_ACTIVE_OR_Y_STANDBY SA_AMF_COMP_ONE_ACTIVE_OR_Y_STANDBY SA_AMF_COMP_ONE_ACTIVE_OR_ONE_STANDBY SA_AMF_COMP_X_ACTIVE SA_AMF_COMP_1_ACTIVE SA_AMF_COMP_NON_PRE_INSTANTIABLE } saAmfCompCapabilityModelT; = 1, = 2, = 3, = 4, = 5, = 6, = 7
10
For a description of the values shown in this enum, refer to Section 3.6 on page 85. 7.4.11 Notification Related Types
typedef enum { SA_AMF_NODE_NAME SA_AMF_SI_NAME SA_AMF_MAINTENANCE_CAMPAIGN_DN }SaAmfAdditionalInfoIdT; = 1, = 2, = 3
15
20
The preceding types are used in Availability Management Framework alarms and notifications (refer to Chapter 11) to convey additional information elements in the Additional Information field associated with alarms and notifications.
25
30
35
40
AIS Specification
225
7.4.12 SaAmfCallbacksT_3
typedef struct { SaAmfHealthcheckCallbackT saAmfHealthcheckCallback; SaAmfComponentTerminateCallbackT saAmfComponentTerminateCallback; SaAmfCSISetCallbackT saAmfCSISetCallback; SaAmfCSIRemoveCallbackT saAmfCSIRemoveCallback; SaAmfProtectionGroupTrackCallbackT saAmfProtectionGroupTrackCallback; SaAmfProxiedComponentInstantiateCallbackT saAmfProxiedComponentInstantiateCallback; SaAmfProxiedComponentCleanupCallbackT saAmfProxiedComponentCleanupCallback; SaAmfContainedComponentInstantiateCallbackT saAmfContainedComponentInstantiateCallback; SaAmfContainedComponentCleanupCallbackT saAmfContainedComponentCleanupCallback; } SaAmfCallbacksT_3;
10
15
20
25
The SaAmfCallbacksT_3 structure defines the various callback functions that the Availability Management Framework may invoke on a component.
30
35
40
226
AIS Specification
10
Parameters 15 amfHandle - [out] A pointer to the handle which identifies this particular initialization of the Availability Management Framework, and which is to be returned by the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. 20 amfCallbacks - [in] If amfCallbacks is set to NULL, no callbacks are registered; if amfCallbacks is not set to NULL, it is a pointer to an SaAmfCallbacksT_3 structure which contains the callback functions of the process that the Availability Management Framework may invoke. Only non-NULL callback functions in this structure will be registered. The SaAmfCallbacksT_3 is defined in Section 7.4.12 on page 226. version - [in/out] As an input parameter, version is a pointer to a structure containing the required Availability Management Framework version. In this case, minorVersion is ignored and should be set to 0x00. As an output parameter, version is a pointer to a structure containing the version actually supported by the Availability Management Framework. The SaVersionT type is defined in [1]. Description This function initializes the Availability Management Framework for the invoking process and registers the various callback functions. This function must be invoked prior to the invocation of any other Availability Management Framework API function. The handle pointed to by amfHandle is returned by the Availability Management Framework as the reference to this association between the process and the Availability Management Framework. The process uses this handle in subsequent communication with the Availability Management Framework.
25
30
35
40
AIS Specification
227
The amfCallbacks parameter points to a structure the contains the callbacks that the Availability Management Framework can invoke. If the implementation supports the specified releaseCode and majorVersion, SA_AIS_OK is returned. In this case, the structure pointed to by the version parameter is set by this function to:
x x
releaseCode = required release code majorVersion = highest value of the major version that this implementation can support for the required releaseCode minorVersion = highest value of the minor version that this implementation can support for the required value of releaseCode and the returned value of majorVersion
10
If the preceding condition cannot be met, SA_AIS_ERR_VERSION is returned, and the version to which the version parameter points is set to: if (implementation supports the required releaseCode) releaseCode = required releaseCode else { if (implementation supports releaseCode higher than the required releaseCode) releaseCode = the lowest value of the supported release codes that is higher than the required releaseCode else releaseCode = the highest value of the supported release codes that is lower than the required releaseCode } majorVersion = highest value of the major versions that this implementation can support for the returned releaseCode minorVersion = highest value of the minor versions that this implementation can support for the returned values of releaseCode and majorVersion Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore.
15
20
25
30
35
40
228
AIS Specification
SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or a process that is providing the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_VERSION - The version provided in the structure to which the version parameter points is not compatible with the version of the Availability Management Framework implementation. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node because it is not a member node. See Also saAmfSelectionObjectGet(), saAmfDispatch(), saAmfFinalize() 7.5.2 saAmfSelectionObjectGet() Prototype
SaAisErrorT saAmfSelectionObjectGet( SaAmfHandleT amfHandle, SaSelectionObjectT *selectionObject );
10
15
20
25
30
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. selectionObject - [out] A pointer to the operating system handle that the process can use to detect pending callbacks. The SaSelectionObjectT type is defined in [1]. 35
40
AIS Specification
229
Description This function returns the operating system handle associated with the handle amfHandle. The invoking process can use the operating system handle to detect pending callbacks, instead of repeatedly invoking the saAmfDispatch() function for this purpose. In a POSIX environment, the operating system handle is a file descriptor that is used with the poll() or select() system calls to detect incoming callbacks. The operating system handle returned by saAmfSelectionObjectGet() is valid until saAmfFinalize() is invoked on the same handle amfHandle. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
20
25
30
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
35
230
AIS Specification
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. dispatchFlags - [in] Flags that specify the callback execution behavior of the saAmfDispatch() function, which have the values SA_DISPATCH_ONE, SA_DISPATCH_ALL, or SA_DISPATCH_BLOCKING. These flags are values of the SaDispatchFlagsT enumeration type, which is described in [1]. Description In the context of the calling thread, this function invokes pending callbacks for the handle amfHandle in a way that is specified by the dispatchFlags parameter. Return Values SA_AIS_OK - The function completed successfully. This value is also returned if this function is being invoked with dispatchFlags set to SA_DISPATCH_ALL or SA_DISPATCH_BLOCKING, and the handle amfHandle has been finalized. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - The dispatchFlags parameter is invalid.
10
15
20
25
30
35
40
AIS Specification
231
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 5
10
15
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. Description The saAmfFinalize() function closes the association represented by the amfHandle parameter between the invoking process and the Availability Management Framework. The process must have invoked saAmfInitialize_3() before it invokes this function. A process must call this function once for each handle it acquired by invoking saAmfInitialize_3(). If the saAmfFinalize() function completes successfully, it releases all resources acquired when saAmfInitialize_3() was called. Moreover, it unregisters all components registered for the particular handle. Furthermore, it stops any tracking associated with the particular handle and cancels all pending callbacks related to the particular handle. Note that because the callback invocation is asynchronous, it is still possible that some callback calls are processed after this call returns successfully. After saAmfFinalize() completes successfully, the handle amfHandle and the selection object associated with it are no longer valid.
20
25
30
35
40
232
AIS Specification
Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. See Also saAmfInitialize_3()
10
15
20
25
30
35
40
AIS Specification
233
10
15
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The Availability Management Framework must maintain the list of components registered with each such handle. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component to be registered. The SaNameT type is defined in [1]. proxyCompName - [in] A pointer to the name of the proxy component that is registering the proxied component which is identified by the name to which compName points. The proxyCompName parameter is used only when a proxied component is being registered by a proxy component; otherwise, it must be set to NULL. The SaNameT type is defined in [1]. Description The saAmfComponentRegister() function can be used by an SA-aware component to register itself with the Availability Management Framework. It can also be used by a proxy component to register a proxied component. An SA-aware component calls saAmfComponentRegister() to inform the Availability Management Framework that it is ready to take component service instance assignments. 35
20
25
30
40
234
AIS Specification
The process of an SA-aware component that registers a (possibly different) component is called the registered process for the registered component. The other processes of the SA-aware component are called unregistered processes. A registered process for an SA-aware or proxied component differs from the unregistered processes in that some of the API functions may be invoked only by a registered process and some of the callbacks functions may be invoked only for a registered process. For a detailed list of these APIs and callbacks, refer to Appendix B. The registered process of a regular SA-aware component or of a contained component must have supplied in its saAmfInitialize_3() call the saAmfCSISetCallback(), saAmfCSIRemoveCallback(), and saAmfComponentTerminateCallback() callback functions. The registered process of a container component must have supplied in its saAmfInitialize_3() call the saAmfCSISetCallback(), saAmfCSIRemoveCallback(), saAmfComponentTerminateCallback(), saAmfContainedComponentInstantiateCallback(), and saAmfContainedComponentCleanupCallback() callback functions. The registered process of a proxy component must also have supplied in its saAmfInitialize_3() call the saAmfProxiedComponentInstantiateCallback() and saAmfProxiedComponentCleanupCallback() callback functions. A component (SA-aware or proxied) must not register or (be registered) twice before having (been) unregistered, even if a different handle obtained by another invocation of the saAmfInitialize_3() call is used. If an SA-aware component fails, it is implicitly unregistered by the Availability Management Framework. The same is true for a proxied component if its proxy fails, but the proxied component itself does not fail. If the proxied component fails, it is the task of the proxy to explicitly unregister the failed component, if wanted. Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later.
10
15
20
25
30
35
40
AIS Specification
235
SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INIT - The previous invocation of saAmfInitialize_3() to initialize the Availability Management Framework was incomplete, since one or more of the callback functions that are listed next were not supplied:
x
If a regular SA-aware or contained component registers itself: saAmfComponentTerminateCallback(), saAmfCSISetCallback(), and saAmfCSIRemoveCallback(). If a container component registers itself: saAmfComponentTerminateCallback(), saAmfCSISetCallback(), saAmfCSIRemoveCallback(), saAmfContainedComponentInstantiateCallback(), and saAmfContainedComponentCleanupCallback(). If a proxy component registers another component: saAmfProxiedComponentInstantiateCallback() and saAmfProxiedComponentCleanupCallback().
10
15
SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. In particular, this value is returned if the value pointed to by compName is not the name of a configured component, or the names pointed to by compName or proxyCompName are not valid component DNs. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - The proxy component identified by the name to which proxyCompName refers has not been registered previously. SA_AIS_ERR_EXIST - The component identified by the name to which compName refers has been registered previously with either the amfHandle handle or another handle obtained by a previous invocation of the saAmfInitialize_3() call. SA_AIS_ERR_BAD_OPERATION - The proxy component which is identified by the name referred to by proxyCompName and which is registering a proxied component has not been assigned the proxy CSI with the active HA state through which the proxied component being registered is supposed to be proxied.
20
25
30
35
40
236
AIS Specification
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 5
See Also saAmfComponentUnregister(), SaAmfCSISetCallbackT, SaAmfCSIRemoveCallbackT, SaAmfComponentTerminateCallbackT, SaAmfProxiedComponentInstantiateCallbackT, SaAmfProxiedComponentCleanupCallbackT, SaAmfContainedComponentInstantiateCallbackT, SaAmfContainedComponentCleanupCallbackT, saAmfInitialize_3() 7.6.2 saAmfComponentUnregister() Prototype
SaAisErrorT saAmfComponentUnregister( SaAmfHandleT amfHandle, const SaNameT *compName, const SaNameT *proxyCompName );
10
15
20
25
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component to be unregistered. The SaNameT type is defined in [1]. proxyCompName - [in] A pointer to the name of the proxy component unregistering the proxied component identified by the name pointed to by compName. This parameter is used only for unregistering a proxied component by a proxy component; otherwise, it must be set to NULL. The SaNameT type is defined in [1].
30
35
40
AIS Specification
237
Description The saAmfComponentUnregister() function can be used for two purposes: A proxy component can unregister one of its proxied components or an SA-aware component can unregister itself. The former case will usually apply to enable another proxy to register for the proxied component. Recall that at a given time at most one proxy can exist for a component. The latter case is used by an SA-aware component to inform the Availability Management Framework that it is unable to continue providing the service, possibly because of a fault condition that is hindering its ability to provide service. When an SA-aware component unregisters with the Availability Management Framework, the framework treats such an unregistration as an error condition (similar to one signaled by an saAmfComponentErrorReport()) and engages the configured default recovery action (for details, see Section 3.12.1.3) on the component. As a consequence, its operational state may become disabled (refer to Section 7.1.1); therefore, all of its component service instances are removed from it. If a proxy component unregisters one of its proxied components, the operational state of the latter does not change because unregistration does not indicate a failure in this case. This decision is motivated by the assumption that another proxy will soon take over the role of the previous proxy. During its life cycle, an SA-aware component can register or unregister multiple times. Also a proxy component can register or unregister a proxied component multiple times. Before unregistering itself, a proxy component must unregister all of its proxied components. It is understood that a failed component is implicitly unregistered while it is cleaned up. The amfHandle in the saAmfComponentUnregister() call must be the same as the one used in the corresponding saAmfComponentRegister() call. Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not.
10
15
20
25
30
35
40
238
AIS Specification
SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, has already been finalized, or the component has not been registered using this handle. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - The proxy component identified by the name to which proxyCompName refers or the proxied component identified by the name to which compName refers has not been registered previously. SA_AIS_ERR_BAD_OPERATION - The requested unregistration is not acceptable because:
x
10
15
the component identified by the name to which proxyCompName refers is not the proxy of the proxied component identified by the name to which compName refers, or the component identified by the name referred to by compName has not unregistered its proxied components before unregistering itself.
20
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
25
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 30
40
AIS Specification
239
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [out] A pointer to the name of the component to which the invoking process belongs. The SaNameT type is defined in [1]. Description This function returns the name of the component to which the invoking process belongs. This function can be invoked by the process before its component has been registered with the Availability Management Framework by calling saAmfComponentRegister(). The component name provided by saAmfComponentNameGet() should be used by a process when it registers its local component. As the Availability Management Framework does not control the creation of all processes that constitute a component, some conventions must be respected by the creators of these processes to allow the saAmfComponentNameGet() function to work properly in the different processes that constitute a component. On operating systems supporting the concept of environment variables, the Availability Management Framework ensures that the SA_AMF_COMPONENT_NAME environment variable is properly set when it runs the INSTANTIATE command to create a component. It is the responsibility of the INSTANTIATE command, and more generally of any entity that creates processes for a component (also when the components are not instantiated by the Availability Management Framework), to ensure that the SA_AMF_COMPONENT_NAME environment variable is properly set to contain the component name when creating new processes. For more information about the environment variables supported by the Availability Management Framework, refer to Section 4.3 on page 177.
10
15
20
25
30
35
40
240
AIS Specification
Note:
It is not guaranteed that saAmfComponentNameGet() works for contained components. If it is not supported by the Availability Management Framework implementation, SA_AIS_ERR_NOT_SUPPORTED is returned.
Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - The Availability Management Framework is not aware of any component associated with the invoking process. SA_AIS_ERR_NOT_SUPPORTED - The Availability Management Framework returns this value if the saAmfComponentNameGet() function is not supported for contained components. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
20
25
30
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
35
AIS Specification
241
10
15
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component to which the monitored processes belong. The SaNameT type is defined in [1]. processId - [in] Identifier of a process to be monitored. The SaInt64T type is defined in [1]. descendentsTreeDepth - [in] Depth of the tree of descendents of the process identified by processId and that are also to be monitored. This parameter is of the SaInt32T type (defined in [1]) and can have the following values:
x
20
25
30
35
A value of 0 indicates that no descendents of the designated process will be monitored. A value of 1 indicates that direct children of the designated process will be monitored. A value of 2 indicates that direct children and grand children of the designated process will be monitored, and so on.
40
242
AIS Specification
A value of 1 indicates that descendents at any level in the descendents tree will be monitored.
pmErrors - [in] Specifies the type of process errors to monitor. Monitoring for several errors can be requested in a single call by ORing different SaAmfPmErrorsT values (this type is defined in Section 7.4.2.1 on page 211):
x
SA_AMF_PM_NON_ZERO_EXIT requests the monitoring of processes exiting with a nonzero exit status. SA_AMF_PM_ZERO_EXIT requests the monitoring of processes exiting with a zero exit status.
10
recommendedRecovery - [in] Recommended recovery to be performed by the Availability Management Framework. For details, refer to Section 7.4.7 on page 222 on the SaAmfRecommendedRecoveryT type. Description The saAmfPmStart_3() function requests the Availability Management Framework to start passive monitoring of specific errors that may occur to a process and its descendents. Currently, only death of processes can be monitored. If one of the errors being monitored occurs for the process or one of its descendents, the Availability Management Framework will automatically report an error on the component identified by the name to which compName refers (for details regarding error reports, see saAmfComponentErrorReport()). The recommended recovery action will be set according to the recommendedRecovery parameter. Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service.
15
20
25
30
35
40
AIS Specification
243
SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - Either one or both of the cases that follow apply:
x
The component identified by the name to which compName refers is not configured in the Availability Management Framework to execute on the local node. The process identified by processId does not exist on the local node.
SA_AIS_ERR_ACCESS - The Availability Management rejects the requested recommended recovery. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
15
20
25
30
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component to which the monitored processes belong. The SaNameT type is defined in [1].
35
40
244
AIS Specification
stopQualifier - [in] Qualifies which processes should stop being monitored. This parameter is of the SaAmfPmStopQualifierT type (defined in Section 7.4.2.2 on page 212) and can have the following values:
x
SA_AMF_PM_PROC: the Availability Management Framework stops monitoring the process identified by processId. SA_AMF_PM_PROC_AND_DESCENDENTS: the Availability Management Framework stops monitoring the process identified by processId and all its descendents. SA_AMF_PM_ALL_PROCESSES: the Availability Management Framework stops monitoring all processes that belong to the component identified by the name to which compName refers.
10
processId - [in] Identifier of the process for which passive monitoring is to be stopped. The SaInt64T type is defined in [1]. pmErrors - [in] Specifies the type of process errors that the Availability Management Framework should stop monitoring for the designated processes. Stopping the monitoring for several errors can be requested in a single call by ORing different SaAmfPmErrorsT values (this type is defined in Section 7.4.2.1 on page 211). Description The saAmfPmStop() function requests the Availability Management Framework to stop passive monitoring of specific errors that may occur to a set of processes belonging to a component. Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly.
15
20
25
30
35
40
AIS Specification
245
SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - Either one, two, or all cases that follow apply:
x
x x
The component identified by the name to which compName refers is not configured in the Availability Management Framework to execute on the local node. The process identified by processId does not execute on the local node. The process identified by processId was not monitored by the Availability Management Framework for errors specified by pmErrors.
10
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
15
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 20
25
30
35
40
246
AIS Specification
10
15
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component to be healthchecked. The SaNameT type is defined in [1]. healthcheckKey - [in] A pointer to the key of the healthcheck to be executed. Based on this key, the Availability Management Framework can retrieve the corresponding healthcheck parameters. The SaAmfHealthcheckKeyT type is defined in Section 7.4.3.2 on page 212. invocationType - [in] This parameter indicates whether the Availability Management Framework or the process itself will invoke the healthcheck calls. The SaAmfHealthcheckInvocationT type is defined in Section 7.4.3.1 on page 212. recommendedRecovery - [in] Recommended recovery to be performed by the Availability Management Framework if the component fails a healthcheck. For details, refer to Section 7.4.7 on page 222 where the SaAmfRecommendedRecoveryT type is defined. 20
25
30
35
40
AIS Specification
247
Description This function starts healthchecks for the component designated by the name pointed to by compName. The variant of the healthcheck (component-invoked or frameworkinvoked) is specified by invocationType. If invocationType is SA_HEALTHCHECK_AMF_INVOKED, the saAmfHealthcheckCallback() callback function must have been supplied when the process invoked the saAmfInitialize_3() call. If a component wants to start more than one healthcheck, it should invoke this function once for each individual healthcheck. It is, however, not possible to have at a given time and on the same amfHandle two healthchecks started for the same component name and healthcheck key. Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INIT - The previous invocation of saAmfInitialize_3() to initialize the Availability Management Framework was incomplete, since the saAmfHealthcheckCallback() callback function is missing, and invocationType specifies SA_HEALTHCHECK_AMF_INVOKED. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory).
10
15
20
25
30
35
40
248
AIS Specification
The Availability Management Framework is not aware of a component designated by the name to which compName refers. The healthcheck identified by the key to which healthcheckKey points is not configured for the component designated by the name to which compName refers. 5
SA_AIS_ERR_ACCESS - The Availability Management rejects the requested recommended recovery. SA_AIS_ERR_EXIST - The healthcheck has already been started on the handle amfHandle for the component designated by the name to which compName refers and for the same value of the key to which healthcheckKey points. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 20
25
30
35
Parameters invocation - [in] This parameter identifies a particular invocation of the callback function. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() function. The SaInvocationT type is defined in [1].
40
AIS Specification
249
compName - [in] A pointer to the name of the component that must undergo the particular healthcheck. The SaNameT type is defined in [1]. healthcheckKey - [in] A pointer to the key of the healthcheck to be executed. The SaAmfHealthcheckKeyT type is defined in Section 7.4.3.2 on page 212. Description The Availability Management Framework requests the component identified by the name pointed to by compName to perform a healthcheck specified by the key pointed to by healthcheckKey. The Availability Management Framework may ask a proxy component to execute a healthcheck on one of its proxied components. This callback is invoked in the context of a thread calling saAmfDispatch() on the handle amfHandle that was specified when the healthcheck operation was started by invoking saAmfHealthcheckStart(). The Availability Management Framework sets invocation, and the component returns invocation as an in parameter when it responds to the Availability Management Framework about the completion of the healthcheck by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x x
10
15
20
SA_AIS_OK - The healthcheck completed successfully. SA_AIS_ERR_FAILED_OPERATION - The component failed to successfully execute the given healthcheck and has reported an error on the faulty component by invoking saAmfComponentErrorReport().
25
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by invoking the saAmfResponse() function within a configured time interval or returns an error, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfHealthcheckMaxDuration configuration attribute of the SaAmfHealthcheck configuration object class (see Section 8.14). See Also saAmfResponse(), saAmfHealthcheckStart(), saAmfComponentErrorReport(), saAmfDispatch() 40 30
35
250
AIS Specification
10
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component for which the healthcheck result is being reported. The SaNameT type is defined in [1]. healthcheckKey - [in] A pointer to the key of the healthcheck whose result is being reported. Based on this key, the Availability Management Framework can retrieve the corresponding healthcheck parameters. The SaAmfHealthcheckKeyT type is defined in Section 7.4.3.2 on page 212. healthcheckResult - [in] This parameter of SaAisErrorT type (defined in [1]) indicates the result of the healthcheck performed by the component. This parameter can take one of the following values:
x x
15
20
25
30
SA_AIS_OK - The healthcheck completed successfully. SA_AIS_ERR_FAILED_OPERATION: the component failed to successfully execute the given healthcheck and has reported an error on itself by invoking saAmfComponentErrorReport().
35
Description This function allows a process to inform the Availability Management Framework that it has performed the healthcheck identified by the key pointed to by healthcheckKey for the component designated by the name to which compName points, and whether the healthcheck was successful or not.
40
AIS Specification
251
Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. In particular, this value is returned if the calling process is not the process that started the healthcheck by invoking saAmfHealthcheckStart(). SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - Either one or both of the cases that follow apply:
x
10
15
20
The Availability Management Framework is not aware of a component designated by the name to which compName points. No component-invoked healthcheck has been started for the component designated by the name to which compName points and for the key referred to by healthcheckKey.
25
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
30
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
35
252
AIS Specification
10
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component for which healthchecks are to be stopped. The SaNameT type is defined in [1]. healthcheckKey - [in] A pointer to the key of the healthcheck to be stopped. Based on this key, the Availability Management Framework can retrieve the corresponding healthcheck parameters. The SaAmfHealthcheckKeyT type is defined in Section 7.4.3.2 on page 212. Description This function is used to stop the healthcheck referred to by the key pointed by healthcheckKey for the component designated by the name to which compName points. Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. 40 35
15
20
25
30
AIS Specification
253
SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. A specific example is when the calling process is not the process that has started the associated healthcheck. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - Either one or both of the cases that follow apply:
x
10
The Availability Management Framework is not aware of a component designated by the name to which compName points. No healthcheck has been started for the component designated by the name to which compName points and for the key to which healthcheckKey refers.
15
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
20
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 25
30
35
40
254
AIS Specification
10
15
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component for which the information is requested. The SaNameT type is defined in [1]. csiName - [in] A pointer to the name of the component service instance for which the information is requested. The SaNameT type is defined in [1]. haState - [out] A pointer to the HA state that the Availability Management Framework has currently assigned to the component identified by the name to which compName points for the component service instance identified by the name to which csiName refers. The HA state is active, standby, quiescing, or quiesced, as defined by the SaAmfHAStateT enumeration type (see Section 7.4.4.1 on page 213). Description The Availability Management Framework returns the HA state of a component identified by the name to which compName refers for the component service instance identified by the name to which csiName refers. 30 20
25
35
40
AIS Specification
255
Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - The component identified by the name to which compName points has not registered with the Availability Management Framework, or the component has not been assigned the component service instance identified by the name to which csiName refers. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
20
25
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
30
40
256
AIS Specification
10
Parameters invocation - [in] This parameter identifies a particular invocation of the callback function. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() or saAmfQuiescingComplete() functions. The SaInvocationT type is defined in [1]. compName - [in] A pointer to the name of the component to which a new component service instance is assigned or for which the HA state of one or all supported component service instances is changed. The SaNameT type is defined in [1]. haState - [in] The new HA state to be assumed by the component for the component service instance identified by csiDescriptor, or for all component service instances already supported by the component (if SA_AMF_CSI_TARGET_ALL is set in csiFlags of the csiDescriptor parameter). The SaAmfHAStateT type is defined in Section 7.4.4.1 on page 213. csiDescriptor - [in] The descriptor with information about the component service instances targeted by this callback invocation. The SaAmfCSIDescriptorT type is defined in Section 7.4.5.5 on page 219. Description The Availability Management Framework invokes this callback to request the component identified by the name to which compName points to assume the HA state specified by haState for one or all component service instances. The component service instances targeted by this call along with additional information about them are provided by the csiDescriptor parameter. 40 35 25 15
20
30
AIS Specification
257
If the haState parameter indicates the new HA state for the CSI(s) is quiescing, the process must notify the Availability Management Framework when the CSI(s) have been quiesced by invoking the saAmfQuiescingComplete() function. When the process invokes the saAmfQuiescingComplete() function, the process returns invocation as an in parameter. This callback is invoked in the context of a thread of a registered process calling saAmfDispatch() on the handle amfHandle that was specified when the component identified by the name referred to by compName was registered by invoking saAmfComponentRegister(). The Availability Management Framework sets invocation, and the process returns invocation as an in parameter when it responds to the Availability Management Framework by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x
10
SA_AIS_OK - The component executed the saAmfCSISetCallback() function successfully. SA_AIS_ERR_FAILED_OPERATION - The component failed to assume the HA state specified by haState for the given component service instance.
15
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by invoking saAmfResponse() within a configured time interval or invokes saAmfResponse() within this time interval with the error parameter set to SA_AIS_ERR_FAILED_OPERATION, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompCSISetCallbackTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2). If the process responds to an saAmfCSISetCallback() callback that specifies the quiescing haState by invoking the saAmfResponse() function with the error parameter set to SA_AIS_OK within the aforementioned time interval, and the process does not subsequently respond that it has successfully quiesced within another configured time interval (by invoking the saAmfQuiescingComplete() function), the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompQuiescingCompleteTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2).
20
25
30
35
40
258
AIS Specification
10
15
Parameters invocation - [in] This parameter identifies a particular invocation of the callback function. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() function. The SaInvocationT type is defined in [1]. compName - [in] A pointer to the name of the component from which all component service instances or the component service instance identified by the name referred to by csiName will be removed. The SaNameT type is defined in [1]. csiName - [in] A pointer to the name of the component service instance that must be removed from the component identified by the name to which compName points. The SaNameT type is defined in [1]. csiFlags - [in] This flag specifies whether one or more component service instances are affected. It can contain one of the values SA_AMF_TARGET_ONE or SA_AMF_TARGET_ALL. The SaAmfCSIFlagsT type is defined in Section 7.4.5.1 on page 215. Description The Availability Management Framework requests the component identified by the name referred to by compName to remove one or all component service instances from the set of component service instances being supported. 40 20
25
30
35
AIS Specification
259
If the value of csiFlags is SA_AMF_TARGET_ONE, csiName points to the name of the component service instance that must be removed. If the value of csiFlags is SA_AMF_TARGET_ALL, csiName is NULL and the component must remove all component service instances. SA_AMF_TARGET_ALL is always set for components that support only the x active or x standby capability model. This callback is invoked in the context of a thread of a registered process calling saAmfDispatch() on the handle amfHandle that was specified when the component identified by the name referred to by compName was registered by invoking saAmfComponentRegister(). The Availability Management Framework sets invocation, and the component returns invocation as an in parameter when it responds to the Availability Management Framework by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x
10
SA_AIS_OK - The component executed the saAmfCSIRemoveCallback() function successfully. SA_AIS_ERR_FAILED_OPERATION - The component failed to remove the given component service instance.
15
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by calling saAmfResponse() within a configured time interval or returns an error, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompCSIRmvCallbackTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2). See Also saAmfResponse(), saAmfComponentRegister(), saAmfDispatch()
20
25
30
35
40
260
AIS Specification
10
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. invocation - [in] The invocation parameter that the Availability Management Framework assigned when it invoked the saAmfCSISetCallback() callback function to request the component to enter the SA_AMF_HA_QUIESCING HA state for a particular component service instance or for all component service instances assigned to it. The SaInvocationT type is defined in [1]. error - [in] The component returns the status of the completion of the quiescing operation in this parameter (of SaAisErrorT type, defined in [1]), which has one of the following values:
x
15
20
25
SA_AIS_OK - The component stopped successfully its activity related to a particular component service instance or to all component service instances assigned to it. SA_AIS_ERR_FAILED_OPERATION - The component failed to stop its activity related to a particular component service instance or to all component service instances assigned to it. Some of the actions required during quiescing might not have been performed.
30
35
If any other error code is returned in this parameter, it will be treated by the Availability Management Framework as if the caller had returned SA_AIS_ERR_FAILED_OPERATION. 40
AIS Specification
261
Description Using this call, a component can notify the Availability Management Framework whether it has successfully stopped its activity related to a particular component service instance or to all component service instances assigned to it, following a previous invocation of the SaAmfCSISetCallbackT callback function of the component by the Availability Management Framework to request the component to enter the SA_AMF_HA_QUIESCING state for that particular component service instance or for all component service instances. The invocation of this API indicates that the component has now completed quiescing the particular component service instance or all component service instances and has transitioned to the quiesced HA state for that particular component service instance or to all component service instances. It is possible that the component is unable to successfully complete the ongoing work due to, for example, a failure in the component. If possible, the component should notify the Availability Management Framework of this fact by invoking this function. The error parameter specifies whether or not the component has stopped cleanly as requested. This function may only be called by the registered process for a component, and the amfHandle must be the same that was used when the registered process registered this component by invoking saAmfComponentRegister(). Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is set incorrectly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory, and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory).
10
15
20
25
30
35
40
262
AIS Specification
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 5
15
20
25
30
35
40
AIS Specification
263
10
15
Parameters invocation - [in] This parameter identifies a particular invocation of this callback. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() function. The SaInvocationT type is defined in [1]. compName - [in] A pointer to the name of the component to be terminated. The SaNameT type is defined in [1]. Description The Availability Management Framework requests the component identified by the name referred to by compName to terminate. To terminate a proxied component, the Availability Management Framework invokes this function on the proxy component that is proxying the component identified by the name to which compName points. The component identified by the name referred to by compName is expected to release all acquired resources and to terminate itself. The invoked process responds by invoking the saAmfResponse() function. On return from the saAmfResponse() function, the Availability Management Framework removes all service instances associated with the component and the component terminates. This callback is invoked in the context of a thread of a registered process calling saAmfDispatch() on the handle amfHandle that was specified when the component identified by the name referred to by compName was registered by invoking saAmfComponentRegister(). 30 20
25
35
40
264
AIS Specification
The Availability Management Framework sets invocation, and the component returns invocation as an in parameter when it responds to the Availability Management Framework by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x x
SA_AIS_OK - The function completed successfully. SA_AIS_ERR_FAILED_OPERATION - The component identified by the name to which compName points failed to terminate.
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by invoking saAmfResponse() within a configured time interval or returns an error, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompTerminateCallbackTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2). See Also saAmfResponse(), saAmfComponentRegister(), saAmfDispatch() 7.10.2 SaAmfProxiedComponentInstantiateCallbackT Prototype
typedef void (*SaAmfProxiedComponentInstantiateCallbackT)( SaInvocationT invocation, const SaNameT *proxiedCompName );
10
15
20
25
30
Parameters invocation - [in] This parameter identifies a particular invocation of this callback function. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() function. The SaInvocationT type is defined in [1]. proxiedCompName - [in] A pointer to the name of the proxied component to be instantiated. The SaNameT type is defined in [1]. 35
40
AIS Specification
265
Description The Availability Management Framework requests a proxy component to instantiate a proxied component identified by the name to which proxiedCompName points. The proxy component to which this request is addressed must have registered the proxied component with the Availability Management Framework before the Availability Management Framework invokes this function. This callback is invoked in the context of a thread of a registered process for a proxy component which calls saAmfDispatch() on the handle amfHandle that was specified when the component identified by the name to which proxiedCompName points was registered by calling saAmfComponentRegister(). The Availability Management Framework sets invocation, and the component returns invocation as an in parameter when it responds to the Availability Management Framework by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x x
10
15
SA_AIS_OK - The function completed successfully. SA_AIS_ERR_FAILED_OPERATION - The proxy component failed to instantiate the proxied component. It is useless for the Availability Management Framework to attempt to instantiate the proxied component again. SA_AIS_ERR_TRY_AGAIN - The proxy component failed to instantiate the proxied component. The Availability Management Framework might retry to instantiate the proxied component. 25 20
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by invoking saAmfResponse() within a configured time interval or returns SA_AIS_ERR_FAILED_OPERATION, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompInstantiateTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2). See Also saAmfResponse(), saAmfComponentRegister(), saAmfDispatch(), SaAmfProxiedComponentCleanupCallbackT
30
35
40
266
AIS Specification
Parameters invocation - [in] This parameter identifies a particular invocation of this callback function. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() function. The SaInvocationT type is defined in [1]. proxiedCompName - [in] A pointer to the name of the proxied component to be abruptly terminated. The SaNameT type is defined in [1]. Description The Availability Management Framework requests a proxy component to abruptly terminate a proxied component identified by the name to which proxiedCompName points. The proxy component to which this request is addressed must have registered with the Availability Management Framework before the Availability Management Framework invokes this function. This callback is invoked in the context of a thread of a registered process for a proxy component calling saAmfDispatch() on the handle amfHandle that was specified when the component identified by the name referred to by proxiedCompName was registered by calling saAmfComponentRegister(). The Availability Management Framework sets invocation, and the component returns invocation as an in parameter when it responds to the Availability Management Framework by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x x
10
15
20
25
30
35
SA_AIS_OK - The function completed successfully. SA_AIS_ERR_FAILED_OPERATION - The proxy component failed to abruptly terminate the proxied component. The Availability Management Framework might issue a further attempt to abruptly terminate the proxied component.
40
AIS Specification
267
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by invoking saAmfResponse() within a configured time interval or returns SA_AIS_ERR_FAILED_OPERATION, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompCleanupTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2). See Also saAmfResponse(), saAmfComponentRegister(), saAmfDispatch(), SaAmfProxiedComponentInstantiateCallbackT
10
20
Parameters invocation - [in] This parameter identifies a particular invocation of this callback function. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() function. The SaInvocationT type is defined in [1]. containedCompName - [in] A pointer to the name of the contained component to be instantiated. The SaNameT type is defined in [1]. Description The Availability Management Framework requests a container component to instantiate a contained component identified by the name to which containedCompName points. This callback is invoked by the Availability Management Framework only if the container component is assigned active for the container CSI that is configured to handle the life cycle of the contained component.
25
30
35
40
268
AIS Specification
This callback is invoked in the context of a thread of a registered process for a container component which calls saAmfDispatch() on the handle amfHandle that was specified when the container component was registered by calling saAmfComponentRegister(). The Availability Management Framework sets invocation, and the component returns invocation as an in parameter when it responds to the Availability Management Framework by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x x
SA_AIS_OK - The function completed successfully. SA_AIS_ERR_FAILED_OPERATION - The container component failed to instantiate the contained component. It is useless for the Availability Management Framework to attempt to instantiate the contained component again. SA_AIS_ERR_TRY_AGAIN - The container component failed to instantiate the contained component. The Availability Management Framework might retry to instantiate the contained component.
10
15
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by invoking saAmfResponse() within a configured time interval or returns SA_AIS_ERR_FAILED_OPERATION, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompInstantiateTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2). See Also saAmfResponse(), saAmfComponentRegister(), saAmfDispatch(), SaAmfContainedComponentCleanupCallbackT
20
25
30
35
40
AIS Specification
269
Parameters invocation - [in] This parameter identifies a particular invocation of this callback function. The invoked process returns invocation when it responds to the Availability Management Framework by calling the saAmfResponse() function. The SaInvocationT type is defined in [1]. containedCompName - [in] A pointer to the name of the contained component to be abruptly terminated. The SaNameT type is defined in [1]. Description The Availability Management Framework requests a container component to abruptly terminate a contained component identified by the name to which containedCompName points. This callback is invoked by the Availability Management Framework only if the container component is assigned active for the container CSI that is configured to handle the life cycle of the contained component. This callback is invoked in the context of a thread of a registered process for a container component calling saAmfDispatch() on the handle amfHandle that was specified when the container component was registered by calling saAmfComponentRegister(). The Availability Management Framework sets invocation, and the component returns invocation as an in parameter when it responds to the Availability Management Framework by invoking the saAmfResponse() function. The error parameter in the invocation of the saAmfResponse() function should be set to:
x x
10
15
20
25
30
35
SA_AIS_OK - The function completed successfully. SA_AIS_ERR_FAILED_OPERATION - The container component failed to abruptly terminate the contained component. The Availability Management Framework might issue a further attempt to abruptly terminate the contained component.
40
270
AIS Specification
Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION. If the invoked process does not respond by calling saAmfResponse() within a configured time interval or returns SA_AIS_ERR_FAILED_OPERATION, the Availability Management Framework must engage the configured recovery policy (see Section 3.12.1.3) for the component to which the process belongs. This time interval is configured by setting the saAmfCompCleanupTimeout configuration attribute of the SaAmfComp configuration object class (see Section 8.13.2). See Also saAmfResponse(), saAmfComponentRegister(), saAmfDispatch(), SaAmfContainedComponentInstantiateCallbackT
10
15
20
25
30
35
40
AIS Specification
271
10
15
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. csiName - [in] A pointer to the name of the component service instance for which tracking is to start. This name is also the name of the protection group. The SaNameT type is defined in [1]. trackFlags - [in] The kind of tracking that is requested, which is the bitwise OR of one or more of the following flags (as defined in [1]), which have the following interpretation here:
x
20
25
SA_TRACK_CURRENT - If notificationBuffer is NULL, information about all components in the protection group is returned by a single subsequent invocation of the saAmfProtectionGroupTrackCallback() notification callback; otherwise, this information is returned in the structure to which notificationBuffer points when the saAmfProtectionGroupTrack() call completes successfully. SA_TRACK_CHANGES - The notification callback is invoked each time at least one change occurs in the protection group membership, or one attribute (HA state or rank) of at least one component in the protection group changes. Information about all of the components is passed to the callback. SA_TRACK_CHANGES_ONLY - The notification callback is invoked each time at least one change occurs in the protection group membership, or one attribute
30
35
40
272
AIS Specification
(HA state or rank) of at least one component in the protection group changes. Only information about components in the protection group that have changed is passed to this callback function.
5 It is not permitted to set both SA_TRACK_CHANGES and SA_TRACK_CHANGES_ONLY in an invocation of this function. The SaUint8T type is defined in [1]. notificationBuffer - [in/out] - A pointer to a buffer of type SaAmfProtectionGroupNotificationBufferT (defined in Section 7.4.6.4 on page 221). This parameter is ignored if SA_TRACK_CURRENT is not set in trackFlags; otherwise and notificationBuffer is not NULL, the buffer will contain information about all components in the protection group when saAmfProtectionGroupTrack() returns. The meaning of the fields of the SaAmfProtectionGroupNotificationBufferT buffer is:
x
10
15
numberOfItems - [in/out] If notification is NULL, numberOfItems is ignored as input parameter; otherwise, it specifies that the buffer pointed to by notification provides memory for information about numberOfItems components in the protection group. When saAmfProtectionGroupTrack() returns with SA_AIS_OK or with SA_AIS_ERR_NO_SPACE, numberOfItems contains the number of components in the protection group. notification - [in/out] If notification is NULL, memory for the protection group information is allocated by the Availability Management Framework. The caller is responsible for freeing the allocated memory by calling the saAmfProtectionGroupNotificationFree() function.
20
25
Description The Availability Management Framework is requested to start tracking changes in the protection group associated with the component service instance identified by the name to which csiName points or changes of attributes of any component in the protection group. These changes are notified by the invocation of the saAmfProtectionGroupTrackCallback() callback function, which must have been supplied when the process invoked the saAmfInitialize_3() call. An application may call saAmfProtectionGroupTrack() repeatedly for the same values of amfHandle and of the component service instance designated by the name referred to by csiName, regardless of whether the call initiates a one-time status request or a series of callback notifications. 30
35
40
AIS Specification
273
If saAmfProtectionGroupTrack() is called with trackFlags containing SA_TRACK_CHANGES_ONLY while changes in the protection group are currently being tracked with SA_TRACK_CHANGES for the same combination of amfHandle and the component service instance designated by the name referred to by csiName, the Availability Management Framework will invoke further notification callbacks according to the new trackFlags. The same is true vice versa. Once saAmfProtectionGroupTrack() has been called with trackFlags containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification callbacks can only be stopped by an invocation of saAmfProtectionGroupTrackStop(). Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INIT - The previous invocation of saAmfInitialize_3() to initialize the Availability Management Framework was incomplete, since the saAmfProtectionGroupTrackCallback() callback function is missing. This value is not returned if only the SA_TRACK_CURRENT flag is set in trackFlags and the notificationBuffer parameter is not NULL. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NO_SPACE - The SA_TRACK_CURRENT flag is set, and the notification field in the structure pointed to by notificationBuffer is not NULL, but the value of numberOfItems in this structure is too small to hold information about all components in the protection group. SA_AIS_ERR_NOT_EXIST - The component service instance designated by the name referred to by csiName cannot be found.
10
15
20
25
30
35
40
274
AIS Specification
SA_AIS_ERR_BAD_FLAGS - The trackFlags parameter is invalid. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
10
15
20
25
Parameters csiName - [in] A pointer to the name of the component service instance. The SaNameT type is defined in [1]. notificationBuffer - [in] A pointer to a notification buffer that contains the requested information about components in the protection group. The SaAmfProtectionGroupNotificationBufferT is defined in Section 7.4.6.4 on page 221. numberOfMembers - [in] The number of the components that belong to the protection group associated with the component service instance designated by the name to which csiName refers. The SaUint32T type is defined in [1]. 30
35
40
AIS Specification
275
error - [in] This parameter indicates whether the Availability Management Framework was able to perform the operation. Possible values for the error parameter (whose type is defined in [1]) are:
x x
SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry the saAmfProtectionGroupTrack() call later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle that was passed to the corresponding saAmfProtectionGroupTrack() call is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly in the corresponding saAmfProtectionGroupTrack() call. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. The process that invoked saAmfProtectionGroupTrack() might have missed one or more notifications. SA_AIS_ERR_NO_RESOURCES - Either the Availability Management Framework library or the provider of the service is out of required resources (other than memory), and cannot provide the service. The process that invoked saAmfProtectionGroupTrack() might have missed one or more notifications. SA_AIS_ERR_NOT_EXIST - The component service instance designated by the name referred to by csiName has been administratively deleted. SA_AIS_ERR_BAD_FLAGS - The trackFlags parameter is invalid in the corresponding saAmfProtectionGroupTrack() call. SA_AIS_ERR_UNAVAILABLE - The operation requested in the corresponding saAmfProtectionGroupTrack() call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
20
25
30
35
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
40
276
AIS Specification
If the error returned is SA_AIS_ERR_NO_MEMORY or SA_AIS_ERR_NO_RESOURCES, the process that invoked saAmfProtectionGroupTrack() should invoke saAmfProtectionGroupTrackStop() and then invoke saAmfProtectionGroupTrack() again to resynchronize with the current state. Description This callback is invoked in the context of a thread calling saAmfDispatch() on the handle amfHandle that was specified when the process called saAmfProtectionGroupTrack() to request tracking of changes in the protection group associated with the component service instance identified by the name to which csiName refers or in an attribute of any component in this protection group. If successful, the saAmfProtectionGroupTrackCallback() function returns the requested information in the structure pointed to by the notificationBuffer parameter. The kind of information returned depends on the setting of the trackFlags parameter of the saAmfProtectionGroupTrack() function. The value of the numberOfItems attribute in the buffer to which the notificationBuffer parameter points might be greater than the value of the numberOfMembers parameter, because some components may no longer be members of the protection group: if the SA_TRACK_CHANGES flag or the SA_TRACK_CHANGES_ONLY flag is set, the structure to which notificationBuffer points might contain information about the current members of the protection group and also about components that have recently left the protection group. If an error occurs, it is returned in the error parameter. Return Values None See Also saAmfProtectionGroupTrack(), saAmfProtectionGroupTrackStop(), saAmfDispatch()
10
15
20
25
30
35
40
AIS Specification
277
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. csiName - [in] A pointer to the name of the component service instance. The SaNameT type is defined in [1]. Description The invoking process requests the Availability Management Framework to stop tracking protection group changes for the component service instance identified by the name to which csiName points.
10
15
20
25 Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. 40 35 30
278
AIS Specification
SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - This value is returned if one or both cases below occurred.
x
The component service instance designated by the name referred to by csiName cannot be found. No track of protection group changes for the component service instance designated by the name referred to by csiName was previously started by invoking saAmfProtectionGroupTrack() with track flags SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY that would still be in effect. 10
SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
15
20
25 Prototype
SaAisErrorT saAmfProtectionGroupNotificationFree( SaAmfHandleT amfHandle, SaAmfProtectionGroupNotificationT *notification );
30
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. notification - [in] A pointer to the notification buffer that was allocated by the Availability Management Framework library in the saAmfProtectionGroupTrack() function and is to be released. The 35
40
AIS Specification
279
SaAmfProtectionGroupNotificationT type is defined in Section 7.4.6.3 on page 221. Description This function frees the memory which is pointed to by notification and which was allocated by the Availability Management Framework library in a previous call to the saAmfProtectionGroupTrack() function. For details, refer to the notificationBuffer parameter in the corresponding invocation of the saAmfProtectionGroupTrack() function. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
20
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership. 25
35
40
280
AIS Specification
10
15
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. erroneousComponent - [in] A pointer to the name of the erroneous component. The SaNameT type is defined in [1]. errorDetectionTime - [in] The absolute time when the reporting component detected the error. If this value is 0, it is assumed that the time at which the library received the error is the error detection time. The SaTimeT type is defined in [1]. recommendedRecovery - [in] Recommended recovery action. The SaAmfRecommendedRecoveryT type is defined in Section 7.4.7 on page 222. ntfIdentifier - [in] Identifier of the notification sent by the component to the Notification Service (for the corresponding type, see [2]) prior to reporting the error to the Availability Management Framework. Description The saAmfComponentErrorReport() function reports an error and provides a recovery recommendation to the Availability Management Framework. The Availability Management Framework validates the recommended recovery action and reacts to it as described in Section 3.12.2.1 on page 170. 40 30
20
25
35
AIS Specification
281
Prior to reporting the error to the Availability Management Framework, the component should send a notification to the Notification Service providing adequate information for cause analysis. The notification identifier returned by the Notification Service must be provided in the ntfIdentifier parameter for correlation purposes. If no notification was produced prior to this call, the special value SA_NTF_IDENTIFIER_UNUSED (see [2]) is passed in ntfIdentifier. Return Values SA_AIS_OK - The function returned successfully, and the Availability Management Framework has been notified of the error report. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. In particular, this value is returned if the SA_AMF_CONTAINER_RESTART: recommended recovery is set in recommendedRecovery, and erroneousComponent does not point to the name of a contained component. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - The component specified by the name to which erroneousComponent refers is not contained in the Availability Management Frameworks configuration. SA_AIS_ERR_ACCESS - The Availability Management rejects the requested recommended recovery. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
20
25
30
35
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
40
282
AIS Specification
10
Parameters amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. compName - [in] A pointer to the name of the component to be cleared of any error. The SaNameT type is defined in [1]. ntfIdentifier - [in] Identifier of the notification sent by the component to the Notification Service (for the corresponding type, see [2]). Description The function cancels the previous errors reported about the component identified by the name to which compName refers. The Availability Management Framework may now change the operational state of the component to "enabled", assuming that nothing else prevents it. The Availability Management Framework may then perform additional assignments of component service instances to the component. Before clearing all errors reported about the component, a notification should be sent by the component to the Notification Service providing adequate information to properly clear active alarms. The notification identifier returned by the Notification Service must be provided in the ntfIdentifier parameter for correlation purposes. If no notification was produced prior to this call, the special value SA_NTF_IDENTIFIER_UNUSED (see [2]) is passed in ntfIdentifier.
15
20
25
30
35
40
AIS Specification
283
Return Values SA_AIS_OK - The function returned successfully, and the Availability Management Framework has been reliably notified about clearing the error. Upon return, it is guaranteed that the Availability Management Framework will not lose the error clear instruction, as long as the cluster is not reset. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly. SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_NOT_EXIST - The component specified by the name to which compName points is not contained in the Availability Management Frameworks configuration. SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
10
15
20
25
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
30
40
284
AIS Specification
10
Parameters 15 amfHandle - [in] The handle which was obtained by a previous invocation of the saAmfInitialize_3() function and which identifies this particular initialization of the Availability Management Framework. The SaAmfHandleT type is defined in Section 7.4.1 on page 211. 20 invocation - [in] This parameter associates an invocation of this response function with a particular invocation of a callback function by the Availability Management Framework. The SaInvocationT type is defined in [1]. error - [in] The response of the process to the associated callback. It returns SA_AIS_OK if the associated callback was successfully executed by the process; otherwise, it returns an appropriate error as described in the corresponding callback. The SaAisErrorT type is defined in [1]. Description The component responds to the Availability Management Framework with the result of the execution of an operation that was requested by the Availability Management Framework when it invoked a callback specifying invocation to identify the requested operation. In the saAmfResponse() call, the component gives that value of invocation back to the Availability Management Framework, so that the Availability Management Framework can associate this response with the callback request. The request can be one of the following types.
x
25
30
35
40
AIS Specification
285
Request for terminating a component. See SaAmfComponentTerminateCallbackT. Request for adding/assigning a given HA state to a component on behalf of a component service instance. See SaAmfCSISetCallbackT. Request for removing a component service instance from a component. See SaAmfCSIRemoveCallbackT. Request for instantiating a proxied component. See SaAmfProxiedComponentInstantiateCallbackT. Request for cleaning up a proxied component. See SaAmfProxiedComponentCleanupCallbackT. Request for instantiating a contained component. See SaAmfContainedComponentInstantiateCallbackT. Request for cleaning up a contained component. See SaAmfContainedComponentCleanupCallbackT.
10
15
The component replies to the Availability Management Framework when either (i) it cannot carry out the request, or (ii) it has failed to successfully complete the execution of the request, or (iii) it has successfully completed the request. With the exception of the response to an saAmfHealthcheckCallback() call, this function may be called only by a registered process for a component, that is, the amfHandle must be the same that was used when the registered process registered the component by invoking saAmfComponentRegister(). The response to an saAmfHealthcheckCallback() call may only be issued by the process that started this healthcheck. Return Values SA_AIS_OK - The function returned successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred before the call could complete. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The process may retry later. SA_AIS_ERR_BAD_HANDLE - The handle amfHandle is invalid, since it is corrupted, uninitialized, or has already been finalized. SA_AIS_ERR_INVALID_PARAM - A parameter is not set correctly.
20
25
30
35
40
286
AIS Specification
SA_AIS_ERR_NO_MEMORY - Either the Availability Management Framework library or the provider of the service is out of memory and cannot provide the service. SA_AIS_ERR_NO_RESOURCES - The system is out of required resources (other than memory). SA_AIS_ERR_UNAVAILABLE - The operation requested in this call is unavailable on this cluster node due to one of the two reasons:
x x
the cluster node has left the cluster membership; the cluster node has rejoined the cluster membership, but the handle amfHandle was acquired before the cluster node left the cluster membership.
10
See Also SaAmfHealthcheckCallbackT, SaAmfComponentTerminateCallbackT, SaAmfCSISetCallbackT, SaAmfCSIRemoveCallbackT, SaAmfProxiedComponentInstantiateCallbackT, SaAmfProxiedComponentCleanupCallbackT, saAmfComponentRegister(), saAmfInitialize_3(), SaAmfContainedComponentInstantiateCallbackT, SaAmfContainedComponentCleanupCallbackT 15
20
25
30
35
40
AIS Specification
287
10
15
20
25
30
35
40
288
AIS Specification
5 The Availability Management Framework UML model is implemented by the SA Forum IMM Service ([4]). For further details on this implementation, refer to the SA Forum Overview document ([1]). The classes in the Availability Management Framework UML class diagrams show the contained attributes and their type, multiplicity, default values, and constraints. The description of each attribute is provided in the SA Forum XMI document (see [5]). The class diagrams additionally show the administrative operations (if any) applicable on these classes. To simplify references, this description uses for the UML diagrams the same names used in [5]. The UML diagrams defined for the Availability Management Framework are:
x x x x x x x x x x x x
10
15
1- Cluster View 3.1- AMF Instances and Types View 3.2- AMF Instances View 3.3- AMF Cluster, Node, and Node Group Classes 3.4- AMF Application Classes 3.5- AMF SG Classes 3.6- AMF SU Classes 3.7- AMF SI Classes 3.8- AMF CSI Classes 3.9a AMF Component Classes 3.9b- AMF Component Type Classes 3.9c- AMF Global Component Attributes and Healthcheck Classes
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 8
289
10
15
20
A value specified for an attribute saAmf<entity><attribute> overrides the value of the associated saAmf<entity type>Def<attribute name> attribute if the saAmf<entity type>Def<attribute name> attribute is specified as a default value for saAmf<entity><attribute> attribute in the entity class. If the <attribute name> contains the "Min" or "Max" tag, the saAmf<entity type>Def<attribute name> value sets the lower/upper boundary for the overriding value. In any other case, the saAmf<entity><attribute> value complements the value specified by saAmf<entity type>Def<attribute name>.
25
30
35
40
290
AIS Specification
10
15
20
25
30
35
40
AIS Specification
291
10
15
20
25
30
35
40
292
AIS Specification
10
FIGURE 26 1- Cluster View
15
20
25
0..* SaClmNode 0..* 0..1 0..1 Maps On SaAmfNode
30
35
40
AIS Specification
293
5
FIGURE 27 3.1- AMF Instances and Types View
1 SaAmfApplication 1 0..* P rotected by 0..1 SaAmfSG 1 0..* Configured on 0..1 Ranked by Configured on 0..1 SaAmfNodeGroup 0..* {xor} SaAmfSIDependency SaAmfSIRankedSU 0..1 0..* SaAmfNode 0..1 0..* 0..* 0..* 0..* Configured on 0..* Hosted on Realizes 0..* SaAmfSUType 0..* 0..* SaAmfSUBaseType 0..* 0..* SaSmfSwBundle SaAmfNodeSwBundle Realizes Realizes SaAmfAppType 0..* 0..* SaAmfSGType 0..* SaAmfSGBaseType SaAmfAppBaseType
10
15
SaAmfSIAssignment 0..* 0..* 0..* 0..* 0..* Assigned to 0..* Realizes 0..* SaAmfSvcType 0..* SaAmfSvcBaseType 1 0..* 1
SaAmfSI
SaAmfSU
20
SaAmfSutCompType
SaAmfSvcTypeCSTypes
SaAmfCSIAssignment 0..* Assigned to 0..* 0..* 0..* SaAmfCompCsType 0..* 1 0..* SaAmfHealthcheck 0..1 1 0..* SaAmfComp Realizes 0..* SaAmfCompType 0..* SaAmfCompBaseType
25
SaAmfCSIAttribute
30
0..* 0..* SaAmfCSBaseType SaAmfCtCsType 1 SaAmfCompGlobalAttributes 0..1 SaAmfHealthcheckType 0..* 0..* SaAmfCSType
35
40
294
AIS Specification
5
FIGURE 28 3.2- AMF Instances View
SaAmfApplication 1
10
Protected by
0..1
0..* SaAmfSG 1 0..* Configured on Configured on 0..1 0..* 0..* SaAmfNode Configured on 0..* 0..* 0..* SaAmfSU 1 0..* 0..* Hosted on 0..1 0..1 {xor} 0..1 SaAmfNodeGroup
15
SaAmfSIDependency
20
0..* 0..*
0..* Assigned to
0..*
25
0..*
SaAmfSIAssignment 0..* 0..* SaAmfCSIAssignment 0..* SaAmfCSIAttribute 1 SaAmfCompGlobalAttributes 0..* SaAmfHealthcheck SaAmfComp 1
Assigned to
30
35
40
AIS Specification
295
SaAmfCluster This configuration object class defines the configuration and runtime attributes of an AMF cluster and the operations that can be applied on the AMF cluster. An object of this class must be configured for each AMF cluster. For details, refer to Section 3.2.1 on page 29, Section 3.3.8 on page 74, and Chapter 9. SaAmfNode This configuration object class defines the configuration and runtime attributes of an AMF node and the operations that can be applied on the AMF node. An object of this class must be configured for each AMF node. For details, refer to Section 3.2.1 on page 29, Section 3.3.6 on page 71, and Chapter 9. SaAmfNodeGroup This configuration object class defines the configuration attributes of a node group, which is used in the configuration of service groups and service units to specify AMF nodes that can host these entities. An object of this class can be configured for a local service unit or a service group that has local service units. For further details, refer to Section 3.7.1.2 on page 91, Section 8.9 on page 300, and Section 8.10 on page 302. No administrative operations are defined for a node group. SaAmfNodeSwBundle This is a configuration association class between the SaAmfNode and SaSmfSwBundle object classes. The SaAmfNodeSwBundle class defines the root installation directory of a particular software bundle on the AMF node in question. It is used to determine the absolute CLC-CLI command path for components of component types delivered by the software bundle when such a component is mapped onto the AMF node.
10
15
20
25
30
35
40
296
AIS Specification
1
FIGURE 29 3.3- AMF Cluster, Node, and Node-Related Classes
<<CONFIG>> SaAmfCluster safAmfCluster : SaStringT [1]{RDN, CONFIG} saAmfClusterClmCluster : SaNameT [0..1] = Empty{CONFIG} saAmfClusterStartupTimeout : SaTimeT [1]{CONFIG, WRITABLE} saAmfClusterAdminState : SaAmfAdminStateT [1] = SA_AMF_ADMIN_UNLOCKED{RUNTIME, CACHED, PERSISTENT, SAUINT32T} SA_AMF_ADMIN_LOCK() SA_AMF_ADMIN_SHUTDOWN() SA_AMF_ADMIN_UNLOCK() SA_AMF_ADMIN_LOCK_INSTANTIATION() SA_AMF_ADMIN_UNLOCK_INSTANTIATION() SA_AMF_ADMIN_RESTART() <<CONFIG>> SaAmfNode safAmfNode : SaStringT [1]{RDN, CONFIG} saAmfNodeClmNode : SaNameT [0..1] = Empty{CONFIG, WRITABLE} saAmfNodeSuFailOverProb : SaTimeT [1]{CONFIG, WRITABLE} saAmfNodeSuFailoverMax : SaUint32T [1]{CONFIG, WRITABLE} saAmfNodeAutoRepair : SaBoolT [1] = 1 (SA_TRUE){CONFIG, WRITABLE, SAUINT32T} saAmfNodeFailfastOnTerminationFailure : SaBoolT [0..1] = 0 (SA_FALSE){CONFIG, WRITABLE, SAUINT32T} saAmfNodeFailfastOnInstantiationFailure : SaBoolT [0..1] = 0 (SA_FALSE){CONFIG, WRITABLE, SAUINT32T} saAmfNodeAdminState : SaAmfAdminStateT [1] = SA_AMF_ADMIN_UNLOCKED{RUNTIME, CACHED, PERSISTENT, SAUINT32T} saAmfNodeOperState : SaAmfOperationalStateT [1]{RUNTIME, CACHED, SAUINT32T} SA_AMF_ADMIN_LOCK() SA_AMF_ADMIN_SHUTDOWN() SA_AMF_ADMIN_UNLOCK() SA_AMF_ADMIN_LOCK_INSTANTIATION() SA_AMF_ADMIN_UNLOCK_INSTANTIATION() SA_AMF_ADMIN_RESTART() SA_AMF_ADMIN_REPAIRED()
10
15
0..* 0..* <<CONFIG>> SaSmfSwBundle
20
25
30
35
40
AIS Specification
297
SaAmfApplication This configuration object class defines configuration and runtime attributes of an application and the operations that can be applied on the application. For each application, an object of this class must be configured, and its saAmfApplicationType attribute must contain the DN of a valid object of the SaAmfApplicationType object class. Additional configuration attributes of an application are defined in the SaAmfAppType class. SaAmfAppType This configuration object class defines configuration attributes of an application type. An application type defines a list of service group types, which implies that an application of the given type must be composed of service groups of types from that list. All applications of the same type share the attribute values defined in the application type configuration. Some of the attribute values of the application type may be overridden in the configuration of any application by setting the corresponding attribute of the configuration object of the application to the required value. SaAmfAppBaseType This configuration object class defines the configuration attributes common to different application types. In particular, a base application type defines the common name of versioned application types. An application type x belongs to a base application type y based on the DN of x, which is the concatenation of the RDN of x (representing its version) with the DN of y.
10
15
20
25
30
35
40
298
AIS Specification
1
FIGURE 30 3.4- AMF Application Classes
<<CONFIG>> SaAmfApplication safApp : SaStringT [1]{RDN, CONFIG} saAmfAppType : SaNameT [1]{CONFIG} saAmfApplicationAdminState : SaAmfAdminStateT [1] = SA_AMF_ADMIN_UNLOCKED{RUNTIME, CACHED, PERSISTENT, SAUINT32T} saAmfApplicationCurrNumSGs : SaUint32T [1]{RUNTIME} SA_AMF_ADMIN_LOCK() SA_AMF_ADMIN_SHUTDOWN() SA_AMF_ADMIN_UNLOCK() SA_AMF_ADMIN_LOCK_INSTANTIATION() SA_AMF_ADMIN_UNLOCK_INSTANTIATION() SA_AMF_ADMIN_RESTART()
10
15
20
25
30
35
40
AIS Specification
299
SaAmfSG This configuration object class defines configuration and runtime attributes of a service group and the operations that can be applied on the service group. For each service group, an object of this class must be configured, and its saAmfSGType attribute must contain the DN of a valid object of the SaAmfSGType object class. Additional configuration attributes of a service group are defined in the SaAmfSGType and the SaAmfNodeGroup (see Section 8.7) object classes. For configuring a node group on which local service units can be instantiated, refer to Section 3.7.1.2 on page 91. SaAmfSGType This configuration object class defines configuration attributes of a service group type. The service group type is a generalization of similar service groups that follow the same redundancy model, provide similar availability, and are composed of units of the same service unit types. A service unit type defined in the service group type must be such that any service unit of this service unit type belonging to a service group of a service group type must be capable of supporting a common set of service types. All service groups of the same type share the attribute values defined in the service group type configuration. Some of the attribute values of the service group type may be overridden in the configuration of any service group by setting the corresponding attribute of the configuration object of the service group to the required value. SaAmfSGBaseType This configuration object class defines the configuration attributes common to different service group types. In particular, a base service group type defines the common name of versioned service group types. A service group type x belongs to a base service group type y based on the DN of x, which is the concatenation of the RDN of x (representing its version) with the DN of y.
10
15
20
25
30
35
40
300
AIS Specification
1
FIGURE 31 3.5- AMF SG Classes
<<CONFIG>> SaAmfSG safSg : SaStringT [1]{RDN, CONFIG} saAmfSGType : SaNameT [1]{CONFIG} saAmfSGSuHostNodeGroup : SaNameT [0..1] = Empty{CONFIG} saAmfSGAutoRepair : SaBoolT [0..1] = saAmfSgtDefAutoRepair{CONFIG, WRITABLE, SAUINT32T} saAmfSGAutoAdjust : SaBoolT [0..1] = saAmfSgtDefAutoAdjust{CONFIG, WRITABLE, SAUINT32T} saAmfSGNumPrefActiveSUs : SaUint32T [0..1] = 1{CONFIG, WRITABLE} saAmfSGNumPrefStandbySUs : SaUint32T [0..1] = 1{CONFIG, WRITABLE} saAmfSGNumPrefInserviceSUs : SaUint32T [0..1] = Number of SUs{CONFIG, WRITABLE} saAmfSGNumPrefAssignedSUs : SaUint32T [0..1] = saAmfSGNumPrefInserviceSUs{CONFIG, WRITABLE} saAmfSGMaxActiveSIsperSU : SaUint32T [0..1] = No limit{CONFIG, WRITABLE} saAmfSGMaxStandbySIsperSU : SaUint32T [0..1] = No limit{CONFIG, WRITABLE} saAmfSGAutoAdjustProb : SaTimeT [0..1] = saAmfSgtDefAutoAdjustProb{CONFIG, WRITABLE} saAmfSGCompRestartProb : SaTimeT [0..1] = saAmfSgtDefCompRestartProb{CONFIG, WRITABLE} saAmfSGCompRestartMax : SaUint32T [0..1] = saAmfSgtDefCompRestartMax{CONFIG, WRITABLE} saAmfSGSuRestartProb : SaTimeT [0..1] = saAmfSgtDefSuRestartProb{CONFIG, WRITABLE} saAmfSGSuRestartMax : SaUint32T [0..1] = saAmfSgtDefSuRestartMax{CONFIG, WRITABLE} saAmfSGAdminState : SaAmfAdminStateT [1] = SA_AMF_ADMIN_UNLOCKED{RUNTIME, CACHED, PERSISTENT, SAUINT32T} saAmfSGNumCurrAssignedSUs : SaUint32T [1]{RUNTIME} saAmfSGNumCurrNonInstantiatedSpareSUs : SaUint32T [1]{RUNTIME} saAmfSGNumCurrInstantiatedSpareSUs : SaUint32T [1]{RUNTIME} SA_AMF_ADMIN_LOCK() SA_AMF_ADMIN_SHUTDOWN() SA_AMF_ADMIN_UNLOCK() SA_AMF_ADMIN_LOCK_INSTANTIATION() SA_AMF_ADMIN_UNLOCK_INSTANTIATION() SA_AMF_ADMIN_SG_ADJUST() <<CONFIG>> SaAmfSGType safVersion : SaStringT [1]{RDN, CONFIG} saAmfSgtRedundancyModel : SaAmfRedundancyModelT [1]{CONFIG, SAUINT32T} saAmfSgtValidSuTypes : SaNameT [1..*]{CONFIG} saAmfSgtDefAutoRepair : SaBoolT [0..1] = 1 (SA_TRUE){CONFIG, WRITABLE, SAUINT32T} saAmfSgtDefAutoAdjust : SaBoolT [0..1] = 0 (SA_FALSE){CONFIG, WRITABLE, SAUINT32T} saAmfSgtDefAutoAdjustProb : SaTimeT [1]{CONFIG, WRITABLE} saAmfSgtDefCompRestartProb : SaTimeT [1]{CONFIG, WRITABLE} saAmfSgtDefCompRestartMax : SaUint32T [1]{CONFIG, WRITABLE} saAmfSgtDefSuRestartProb : SaTimeT [1]{CONFIG, WRITABLE} saAmfSgtDefSuRestartMax : SaUint32T [1]{CONFIG, WRITABLE} <<CONFIG>> SaAmfSGBaseType safSgType : SaStringT [1]{RDN, CONFIG}
10
15
20
25
30
35
40
AIS Specification
301
SaAmfSU This configuration object class defines configuration and runtime attributes of a service unit and the operations that can be applied on the service unit. For each service unit, an object of this class must be configured, and its saAmfSUType attribute must contain the DN of a valid object of the SaAmfSUType object class. Additional configuration attributes of a service unit are defined
x x
10
in the SaAmfSUType object class, in either the SaAmfNodeGroup or SaAmfNode (see Section 8.7) object classes, and in the SaAmfSutCompType association class. 15
For configuring a node or a node group on which a local service unit is instantiated, refer to Section 3.7.1.2 on page 91.
x
SaAmfSUType This configuration object class defines configuration attributes of a service unit type. The service unit type defines a list of component types and, for each component type, the number of components that a service unit of this type may accommodate. Each element in this list is expressed by the SaAmfSutCompType association class, which is described below. A service unit of a given type may only consist of components of the component types from that list, and the number of these components must be within the range specified for the component type. All service units of the same type share the attribute values defined in the service unit type configuration. Some of the attribute values of the service unit type may be overridden in the configuration of any service unit by setting the corresponding attribute of the configuration object of the service unit to the required value. All service units of the same type can be assigned service instances derived from the same set of service types. SaAmfSUBaseType This configuration object class defines the configuration attributes common to different service unit types. In particular, a base service unit type defines the common name of versioned service unit types. A service unit type x belongs to a base service unit type y based on the DN of x, which is the concatenation of the RDN of x (representing its version) with the DN of y. SaAmfSutCompType This is a configuration association class between the SaAmfSUType and SaAmfCompType object classes. The SaAmfSutCompType class defines configuration attributes of a component type that can be contained in a service unit of the service unit type. An object of this class must be configured for each component type that can be contained in a service unit of the type
20
25
30
35
40
302
AIS Specification
defined by the SaAmfSUType object class. The number of member component types in an service unit type can be determined by the number of saAmfSutCompType objects configured for the service unit type.
10
15
20
25
30
35
40
AIS Specification
303
1
FIGURE 32 3.6- AMF SU Classes
<<CONFIG>> SaAmfSU safSu : SaStringT [1]{RDN, CONFIG} saAmfSUType : SaNameT [1]{CONFIG} saAmfSURank : SaUint32T [0..1] = 0{CONFIG, WRITABLE} saAmfSUHostNodeOrNodeGroup : SaNameT [0..1] = Empty{CONFIG} saAmfSUFailover : SaBoolT [0..1] = saAmfSutDefSUFailover{CONFIG, WRITABLE, SAUINT32T} saAmfSUMaintenanceCampaign : SaNameT [0..1] = Empty{CONFIG} saAmfSUPreInstantiable : SaBoolT [1]{RUNTIME, CACHED, SAUINT32T} saAmfSUOperState : SaAmfOperationalStateT [1]{RUNTIME, CACHED, SAUINT32T} saAmfSUAdminState : SaAmfAdminStateT [1] = SA_AMF_ADMIN_UNLOCKED{RUNTIME, CACHED, PERSISTENT, SAUINT32T} saAmfSUReadinessState : SaAmfReadinessStateT [1]{RUNTIME, CACHED, SAUINT32T} saAmfSUPresenceState : SaAmfPresenceStateT [1]{RUNTIME, CACHED, SAUINT32T} saAmfSUAssignedSIs : SaNameT [0..*] = Empty{RUNTIME} saAmfSUHostedByNode : SaNameT [0..1] = Empty{RUNTIME, CACHED} saAmfSUNumCurrActiveSIs : SaUint32T [1]{RUNTIME} saAmfSUNumCurrStandbySIs : SaUint32T [1]{RUNTIME} saAmfSURestartCount : SaUint32T [1]{RUNTIME} SA_AMF_ADMIN_LOCK() SA_AMF_ADMIN_SHUTDOWN() SA_AMF_ADMIN_UNLOCK() SA_AMF_ADMIN_LOCK_INSTANTIATION() SA_AMF_ADMIN_UNLOCK_INSTANTIATION() SA_AMF_ADMIN_RESTART() SA_AMF_ADMIN_REPAIRED() SA_AMF_ADMIN_EAM_START() SA_AMF_ADMIN_EAM_STOP() <<CONFIG>> SaAmfSUType safVersion : SaStringT [1]{RDN, CONFIG} saAmfSutIsExternal : SaBoolT [1]{CONFIG, SAUINT32T} saAmfSutDefSUFailover : SaBoolT [1]{CONFIG, WRITABLE, SAUINT32T} saAmfSutProvidesSvcTypes : SaNameT [1..*]{CONFIG} <<CONFIG>> SaAmfSUBaseType safSuType : SaStringT [1]{RDN, CONFIG} <<CONFIG>> SaAmfSutCompType safMemberCompType : SaNameT [1]{RDN, CONFIG} saAmfSutMaxNumComponents : SaUint32T [0..1] = No limit{CONFIG} saAmfSutMinNumComponents : SaUint32T [0..1] = 1{CONFIG}
10
15
20
25
0..* 0..* <<CONFIG>> SaAmfCompType
30
35
40
304
AIS Specification
SaAmfSI This configuration object class defines configuration and runtime attributes of a service instance and the operations that can be applied on the service instance. For each service instance, an object of this class must be configured, and its saAmfSvcType attribute must contain the DN of a valid object of the SaAmfSvcType object class. Additional configuration attributes of a service instance are defined in the SaAmfSvcType object class and in the association classes SaAmfSIDependency, SaAmfSIRankedSU, and SaAmfSvcTypeCSTypes. The runtime attributes of the assignment of a service instance to a service unit are defined in the SaAmfSIAssignment association class. SaAmfSvcType This configuration object class together with the associated SaAmfSvcTypeCSTypes class defines configuration attributes of a service type. The service type defines a list of component service types of which a service instance may be composed. The service type also defines for each component service type the number of component service instances that a service instance of the given type may aggregate. All service instances of the same type share the attribute values defined in the service type configuration. SaAmfSvcBaseType This configuration object class defines the configuration attributes common to different service types. In particular, a base service type defines the common name of versioned service types. A service type x belongs to a base service type y based on the DN of x, which is the concatenation of the RDN of x (representing its version) with the DN of y. SaAmfSIDependency This is a configuration association class between SaAmfSI object classes. The SaAmfSIDependency class defines configuration attributes for a dependency of a service instance on another service instance, as explained in Section 3.9.1 on page 155. This object class must be configured for each dependency of a service instance on another service instance. SaAmfSIRankedSU This is a configuration association class between the SaAmfSI and the SaAmfSU object classes. The SaAmfSIRankedSU class is used to define the ranked list of service units per service instance, which is required in the N-way (see Section 3.7.4) and N-way active redundancy models (see Section 3.7.5). If an object of this class is not configured, the ranked list of service units for a service instance for the N-way and N-way active redundancy models is given by the ordered list of service units in the service group (this order is configured by setting the saAmfSURank attribute of the SaAmfSU object class, see Section 8.10).
10
15
20
25
30
35
40
AIS Specification
305
SaAmfSIAssignment This is a runtime association class between the SaAmfSI and the SaAmfSU object classes. The SaAmfSIAssignment class defines the attributes of an assignment of a service instance to a service unit, as explained in Section 3.3.1.5. SaAmfSvcTypeCSTypes This is a configuration association class between the SaAmfSvcType and SaAmfCSType object classes. The SaAmfSvcTypeCSTypes class defines the saAmfSvctMaxNumCSIs configuration attribute to indicate the maximum number of instances of a member CS type (identified by safMemberCSType) that any service instance of a certain service type can have. An object of this class must be configured for each CS type that is contained in a service instance of the type defined by the SaAmfSvcType object class.
10
15
20
25
30
35
40
306
AIS Specification
1
FIGURE 33 3.7- AMF SI Classes
0..* 0..*
5
0..* Assigned to SaAmfSU 0..* 0..*
safSi : SaStringT [1]{RDN, CONFIG} 0..* saAmfSvcType : SaNameT [1]{CONFIG} saAmfSIProtectedbySG : SaNameT [0..1] = Empty{CONFIG, WRITABLE} saAmfSIRank : SaUint32T [0..1] = 0{CONFIG, WRITABLE} saAmfSIPrefActiveAssignments : SaUint32T [0..1] = 1{CONFIG, WRITABLE} saAmfSIPrefStandbyAssignments : SaUint32T [0..1] = 1{CONFIG, WRITABLE} saAmfSIAdminState : SaAmfAdminStateT [1] = SA_AMF_ADMIN_UNLOCKED{RUNTIME, CACHED, PERSISTENT, SAUINT32T} saAmfSIAssignmentState : SaAmfAssignmentStateT [1]{RUNTIME, CACHED, SAUINT32T} saAmfSINumCurrActiveAssignments : SaUint32T [1]{RUNTIME} saAmfSINumCurrStandbyAssignments : SaUint32T [1]{RUNTIME} SA_AMF_ADMIN_LOCK() SA_AMF_ADMIN_UNLOCK() SA_AMF_ADMIN_SI_SWAP() SA_AMF_ADMIN_SHUTDOWN() <<CONFIG>> SaAmfSvcType safVersion : SaStringT [1]{RDN, CONFIG} <<CONFIG>> SaAmfSvcBaseType safSvcType : SaStringT [1]{RDN, CONFIG} 0..* 0..*
10
15
SaAmfCSType
20
25
<<CONFIG>> SaAmfSIDependency safDepend : SaNameT [1]{RDN, CONFIG} saAmfToleranceTime : SaTimeT [0..1] = 0{CONFIG, WRITABLE} <<CONFIG>> SaAmfSIRankedSU safRankedSu : SaNameT [1]{RDN, CONFIG} saAmfRank : SaUint32T [1]{CONFIG, WRITABLE} <<RUNTIME>> SaAmfSIAssignment safSISU : SaNameT [1]{RDN, RUNTIME, CACHED} saAmfSISUHAState : SaAmfHAStateT [1]{RUNTIME, CACHED, SAUINT32T} <<CONFIG>> SaAmfSvcTypeCSTypes safMemberCSType : SaNameT [1]{RDN, CONFIG} saAmfSvctMaxNumCSIs : SaUint32T [0..1] = No limit{CONFIG}
30
35
40
AIS Specification
307
SaAmfCSI This configuration object class defines configuration attributes of a component service instance, namely the name of the component service type to which the component service instance belongs and a list of component service instances on which the component service instance depends (see Section 3.9.1.3). For each component service instance, an object of this class must be configured, and its saAmfCSType attribute must contain the DN of a valid object of the SaAmfCSType object class. Additional configuration attributes of a component service instance are defined in the SaAmfCSType and SaAmfCSIAttribute object classes. The runtime attributes of the assignment of a component service instance to a component are defined in the SaAmfCSIAssignment association class. SaAmfCSType This configuration object class defines configuration attributes of a component service type. The component service type is the generalization of similar component service instances (that is, similar workloads) that are seen by the Availability Management Framework as equivalent and handled in the same manner. The component service type defines the list of attribute names (as described in Section 3.2.3) for all component service instances belonging to the type. SaAmfCSBaseType This configuration object class defines the configuration attributes common to different component service types. In particular, a base component service type defines the common name of versioned component service types. A component service type x belongs to a base component service type y based on the DN of x, which is the concatenation of the RDN of x (representing its version) with the DN of y. SaAmfCSIAttribute This configuration object class defines the name and value of an attribute (as described in Section 3.2.3) of a component service instance. An SaAmfCSIAttribute object must be defined for each attribute name listed in SaAmfCSType. SaAmfCSIAssignment This is a runtime association class between the SaAmfCSI and the SaAmfComp object classes. The SaAmfCSIAssignment class defines the attributes of an assignment of a component service instance to a component, as explained in Section 3.3.2.4.
10
15
20
25
30
35
40
308
AIS Specification
1
FIGURE 34 3.8- AMF CSI Classes
<<CONFIG>> SaAmfCSI safCsi : SaStringT [1]{RDN, CONFIG} saAmfCSType : SaNameT [1]{CONFIG} saAmfCSIDependencies : SaNameT [0..*] = Empty{CONFIG, WRITABLE} <<CONFIG>> SaAmfCSType safVersion : SaStringT [1]{RDN, CONFIG} saAmfCSAttrName : SaStringT [0..*] = Empty{CONFIG, WRITABLE} <<CONFIG>> SaAmfCSBaseType safCSType : SaStringT [1]{RDN, CONFIG} <<CONFIG>> SaAmfCSIAttribute safCsiAttr : SaStringT [1]{RDN, CONFIG} saAmfCSIAttriValue : SaStringT [0..*] = Empty{CONFIG, WRITABLE} <<RUNTIME>> SaAmfCSIAssignment safCSIComp : SaNameT [1]{RDN, RUNTIME, CACHED} saAmfCSICompHAState : SaAmfHAStateT [1]{RUNTIME, CACHED, SAUINT32T}
5
SaAmfComp
10
15
20
25
30
35
40
AIS Specification
309
SaAmfCompTypeThis configuration object class defines configuration attributes of a component type. A component type represents a particular version of the software or hardware implementation that is used to construct components. All components of the same type share the attribute values defined in the component type configuration. Some of the attributes of the component type are defined by the SaAmfCtCsType association class, which is described below. Some of the attribute values of the component type may be overridden or extended in the configuration of any component by setting the corresponding attribute of the configuration object of the component to the required value. SaAmfCompBaseType This configuration object class defines the configuration attributes common to different component types. In particular, a base component type defines the common name of versioned component types. A component type x belongs to a base component type y based on the DN of x, which is the concatenation of the RDN of x (representing its version) with the DN of y. SaAmfCtCsType This is a configuration association class between the SaAmfCompType and SaAmfCSType object classes. The SaAmfCtCsType class defines configuration attributes of a component type for component service instances of a certain component service type (identified by safSupportedCsType) that can be assigned to a component of this component type. An object of this class must be configured for each CS type that can be assigned to a component of the type defined by the SaAmfCompType object class. For further details, see also the description of the SaAmfCompCsType class in Section 8.13.2.
10
15
20
25
30
35
40
310
AIS Specification
1
FIGURE 35 3.9b- AMF Component Type Classes
<<CONFIG>> SaAmfCompType safVersion : SaStringT [1]{RDN, CONFIG} saAmfCtCompCategory : SaAmfCompCategoryT [1]{CONFIG, SAUINT32T} saAmfCtSwBundle : SaNameT [0..1] = Empty{CONFIG} saAmfCtDefCmdEnv : SaStringT [0..*] = Empty{CONFIG} saAmfCtDefClcCliTimeout : SaTimeT [0..1] = Empty{CONFIG, WRITABLE} saAmfCtDefCallbackTimeOut : SaTimeT [0..1] = Empty{CONFIG, WRITABLE} saAmfCtRelPathInstantiateCmd : SaStringT [0..1] = Empty{CONFIG} saAmfCtDefInstantiateCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCtDefInstantiationLevel : SaUint32T [0..1] = 0{CONFIG, WRITABLE} saAmfCtRelPathTerminateCmd : SaStringT [0..1] = Empty{CONFIG} saAmfCtDefTerminateCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCtRelPathCleanupCmd : SaStringT [0..1] = Empty{CONFIG} saAmfCtDefCleanupCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCtRelPathAmStartCmd : SaStringT [0..1] = Empty{CONFIG} saAmfCtDefAmStartCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCtRelPathAmStopCmd : SaStringT [0..1] = Empty{CONFIG} saAmfCtDefAmStopCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCtDefQuiescingCompleteTimeout : SaTimeT [0..1]{CONFIG, WRITABLE} saAmfCtDefRecoveryOnError : SaAmfRecommendedRecoveryT [1]{CONFIG, WRITABLE, SAUINT32T} saAmfCtDefDisableRestart : SaBoolT [0..1] = 0 (SA_FALSE){CONFIG, WRITABLE, SAUINT32T}
10
0..* 0..* SaAmfCSType
15
20
<<CONFIG>> SaAmfCompBaseType
25
safCompType : SaStringT [1]
<<CONFIG>> SaAmfCtCsType safSupportedCsType : SaNameT [1]{RDN, CONFIG} saAmfCtCompCapability : SaAmfCompCapabilityModelT [1]{CONFIG, SAUINT32T} saAmfCtDefNumMaxActiveCSIs : SaUint32T [0..1] = No limit{CONFIG} saAmfCtDefNumMaxStandbyCSIs : SaUint32T [0..1] = No limit{CONFIG}
30
35
40
AIS Specification
311
8.13.2 Component Classes Diagram This diagram contains the following classes:
x
SaAmfComp This configuration object class defines configuration and runtime attributes of a component and the operations that can be applied on the component. An object of this class must be configured for each component, and its saAmfCompType attribute must contain the DN of a valid object of the SaAmfCompType object class. Additional configuration attributes of a component are defined in the SaAmfCompType object class and in the SaAmfCtCsType association class (see Section 8.13.1), in the SaAmfCompCSType association class, and in the SaAmfCompGlobalAttributes object class (see Section 8.14). SaAmfCompCsType This is a configuration association class between the SaAmfComp and SaAmfCSType object classes. The SaAmfCompCsType class defines configuration and runtime attributes of a component for component service types (each one identified by the attribute safSupportedCsType) that can be assigned to the component. An object of this class must be configured for each CS type that can be assigned to a component configured by an object of the SaAmfComp object class. The attributes of the SaAmfCompCsType class are in a particular relation with the attributes of the SaAmfCtCsType class. An attribute designated by saAmfCtDef<attribute name> in the SaAmfCtCsType class defines the default value or an upper limit for the saAmfComp<attribute name> attribute of the SaAmfCompCsType class. Concerning the rules for overriding or complementing such default values, refer to Section 8.1.
10
15
20
25
30
35
40
312
AIS Specification
1
FIGURE 36 3.9a- AMF Component Classes
<<CONFIG>> SaAmfComp safComp : SaStringT [1]{RDN, CONFIG} saAmfCompType : SaNameT [1]{CONFIG} saAmfCompCmdEnv : SaStringT [0..*] = Empty{CONFIG} saAmfCompInstantiateCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCompInstantiateTimeout : SaTimeT [0..1] = saAmfCtDefClcCliTimeout{CONFIG, WRITABLE} saAmfCompInstantiationLevel : SaUint32T [0..1] = saAmfCtDefInstantiationLevel{CONFIG, WRITABLE} saAmfCompNumMaxInstantiateWithoutDelay : SaUint32T [0..1] = saAmfNumMaxCompInstantiateWithoutDelay{CONFIG, WRITABLE} saAmfCompNumMaxInstantiateWithDelay : SaUint32T [0..1] = saAmfNumMaxCompInstantiateWithDelay{CONFIG, WRITABLE} saAmfCompDelayBetweenInstantiateAttempts : SaTimeT [0..1] = saAmfDelayBetweenCompInstantiateAttempts{CONFIG, WRITABLE} saAmfCompTerminateCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCompTerminateTimeout : SaTimeT [0..1] = saAmfCtDefClcCliTimeout{CONFIG, WRITABLE} saAmfCompTerminateCallbackTimeout : SaTimeT [0..1] = saAmfCtDefCallbackTimeout{CONFIG, WRITABLE} saAmfCompCleanupCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCompCleanupTimeout : SaTimeT [0..1] = saAmfCtDefClcCliTimeout{CONFIG, WRITABLE} saAmfCompAmStartCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCompAmStartTimeout : SaTimeT [0..1] = saAmfCtDefClcCliTimeout{CONFIG, WRITABLE} saAmfCompNumMaxAmStartAttempts : SaUint32T [0..1] = saAmfNumMaxCompAmStartAttempts{CONFIG, WRITABLE} saAmfCompAmStopCmdArgv : SaStringT [0..*] = Empty{CONFIG, WRITABLE} saAmfCompAmStopTimeout : SaTimeT [0..1] = saAmfCtDefClcCliTimeout{CONFIG, WRITABLE} saAmfCompNumMaxAmStopAttempts : SaUint32T [0..1] = saAmfNumMaxCompAmStopAttempts{CONFIG, WRITABLE} saAmfCompCSISetCallbackTimeout : SaTimeT [0..1] = saAmfCtDefCallbackTimeout{CONFIG, WRITABLE} saAmfCompCSIRmvCallbackTimeout : SaTimeT [0..1] = saAmfCompDefaultCallbackTimeout{CONFIG, WRITABLE} saAmfCompQuiescingCompleteTimeout : SaTimeT [0..1] = saAmfCtDefQuiescingCompleteTimeout{CONFIG, WRITABLE} saAmfCompRecoveryOnError : SaAmfRecommendedRecoveryT [0..1] = saAmfCtDefRecoveryOnError{CONFIG, WRITABLE, SAUINT32T} saAmfCompDisableRestart : SaBoolT [0..1] = saAmfCtDefDisableRestart{CONFIG, WRITABLE, SAUINT32T} saAmfCompProxyCsi : SaNameT [0..1] = Empty{CONFIG, WRITABLE} saAmfCompContainerCsi : SaNameT [0..1] = Empty{CONFIG, WRITABLE} saAmfCompOperState : SaAmfOperationalStateT [1]{RUNTIME, CACHED, SAUINT32T} saAmfCompReadinessState : SaAmfReadinessStateT [1]{RUNTIME, CACHED, SAUINT32T} saAmfCompPresenceState : SaAmfPresenceStateT [1]{RUNTIME, CACHED, SAUINT32T} saAmfCompRestartCount : SaUint32T [1]{RUNTIME} saAmfCompCurrProxyName : SaNameT [0..1] = Empty{RUNTIME} saAmfCompCurrProxiedNames : SaNameT [0..*] = Empty{RUNTIME} SA_AMF_ADMIN_RESTART() SA_AMF_ADMIN_EAM_START() SA_AMF_ADMIN_EAM_STOP() 0..* 0..* SaAmfCSType
10
15
20
25
30
35
<<CONFIG>> SaAmfCompCsType safSupportedCsType : SaNameT [1]{RDN, CONFIG} saAmfCompNumMaxActiveCSIs : SaUint32T [0..1] = saAmfCtDefNumMaxActiveCsi{CONFIG} saAmfCompNumMaxStandbyCSIs : SaUint32T [0..1] = saAmfCtDefNumMaxStandbyCsi{CONFIG} saAmfCompNumCurrActiveCSIs : SaUint32T [1]{RUNTIME} saAmfCompNumCurrStandbyCSIs : SaNameT [1]{RUNTIME} saAmfCompAssignedCsi : SaNameT [0..*] = Empty{RUNTIME}
40
AIS Specification
313
SaAmfCompGlobalAttributes This configuration object class collects those component configuration attributes for which the default value is set globally. Each of these global attributes is referred to by a corresponding attribute in the SaAmfComp component configuration object class. One and only one object of this class must be configured for each AMF cluster. The global default values can be overridden in the configuration of any component by setting the corresponding attribute of the configuration object of the component to the required value. Example: saAmfNumMaxInstantiateWithoutDelay in SaAmfCompGlobalAttributes corresponds to saAmfCompNumMaxInstantiateWithoutDelay in the SaAmfComp class.
10
15
SaAmfHealthcheck This configuration object class defines the attributes for a component healthcheck for a certain healthcheck key (see also Section 7.1.2.4). If an object of this class is configured for a component, its attribute values override the corresponding attributes provided in the healthcheck type configuration (see the SaAmfHealthcheckType object class) for the component type and for the same healthcheck key. The healthcheck configuration for the component can only specify healthcheck keys for which there is a healthcheck type configuration for its component type. The IMM object representing the component healthcheck has a DN of the form "safHealthcheckKey=,safComp=,safSu=,safSg=,safApp=". SaAmfHealthcheckType This configuration object class defines the attributes for a component healthcheck type (see also Section 7.1.2.4). Each healthcheck type is identified by a healthcheck key. An object of this class must be configured for each healthcheck key that a component of a component type uses to start a healthcheck. All components of the same type share the healthcheck attribute values defined in the healthcheck type configuration. The IMM object representing the component healthcheck type has a DN of the form "safHealthcheckKey=,safVersion=,safCompBaseType=".
20
25
30
35
40
314
AIS Specification
1
FIGURE 37 3.9c- AMF Global Component Attributes and Healthcheck Classes
<<CONFIG>> SaAmfCompGlobalAttributes safRdn : SaStringT [1]{RDN, CONFIG} saAmfNumMaxInstantiateWithoutDelay : SaUint32T [0..1] = 2{CONFIG, WRITABLE} saAmfNumMaxInstantiateWithDelay : SaUint32T [0..1] = 0{CONFIG, WRITABLE} saAmfNumMaxAmStartAttempts : SaUint32T [0..1] = 2{CONFIG, WRITABLE} saAmfNumMaxAmStopAttempts : SaUint32T [0..1] = 2{CONFIG, WRITABLE} saAmfDelayBetweenInstantiateAttempts : SaTimeT [0..1] = 0{CONFIG, WRITABLE}
10
<<CONFIG>> SaAmfHealthcheck safHealthcheckKey : SaAmfHealthcheckKeyT [1]{RDN,CONFIG,SASTRINGT} saAmfHealthcheckPeriod : SaTimeT [0..1]{CONFIG, WRITABLE} saAmfHealthcheckMaxDuration : SaTimeT [0..1]{CONFIG, WRITABLE}
15
20
<<CONFIG>> SaAmfHealthcheckType safHealthcheckKey : SaAmfHealthcheckKeyT [1]{RDN,CONFIG,SASTRINGT} saAmfHctDefPeriod : SaTimeT [1]{CONFIG, WRITABLE} saAmfHctDefMaxDuration : SaTimeT [1]{CONFIG, WRITABLE}
25
30
35
40
AIS Specification
315
10
15
20
25
30
35
40
316
AIS Specification
9 Administration API
This section describes the various administrative API functions that the IMM Service exposes on behalf of the Availability Management Framework to a system administrator. These API functions are described using a C API syntax. The main clients of this administrative API are system management applications and SNMP agents that typically convert system administration commands (invoked from a management station) to the correct administrative API sequence to yield the wanted result that is expected upon execution of the system administration command.
10
15
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 9
317
an administrative operation on a logical entity A, any other administrative operation that involves a logical entity with which this logical entity A has a relationship (association or aggregation) will not be allowed until the first operation on A is done. A general principle that has been adhered to while specifying these administrative operations is that an operation done at a given scope can only be undone by performing the reverse operation at the same scope. This means, for example, one cannot lock at the AMF node-level and then unlock each service unit one by one at the service unit-level. This principle is especially applicable to administrative operations that manipulate the administrative state. These API functions will be exposed by the IMM Service Object Management library (see [4]).
10
15
20
25
30
35
40
318
AIS Specification
10
15
20
25
30
35
40
AIS Specification
319
saImmOmAdminOperationInvokeAsync() functions (see [4]) with the appropriate operationId (described in Section 9.3.1) on the entity designated by the name to which objectName points. The return values explained in the following sections for various administrative operations shall be passed by the operationReturnValue parameter, which is provided by the invoker of the saImmOmAdminOperationInvoke() or saImmOmAdminOperationInvokeAsync() functions to obtain return codes from the object implementer (Availability Management Framework, in this case). The operations described in the following subsections are applicable to and have the same effects on both pre-instantiable and non-pre-instantiable service units, unless explicitly stated otherwise. 9.4.1 Administrative State Modification Operations A fair number of administrative operations involve the manipulation of the administrative state. To aid in the description of such administrative operations, FIGURE 38 illustrates the various administrative states and the various operations that are applicable on an entity when it is in a particular administrative state. The abbreviations used in this figure and their meaning are:
x x x x x
10
15
20
25
The dotted line in the figure represents the internal (spontaneous) transition corresponding to the completion of the shutting down operation; this transition moves the entity into locked state without further external intervention.
30
35
40
320
AIS Specification
1
FIGURE 38 Administrative States and Related Operations for Availability Management Framework Entities
L 5 unlocked UL locked
10 UL SD L SD Complete ULI LI 15
shutting-down
locked-instantiation 20
25
30
35
40
AIS Specification
321
9.4.2 SA_AMF_ADMIN_UNLOCK Parameters operationId = SA_AMF_ADMIN_UNLOCK objectName - [in] A pointer to the name of the logical entity to be unlocked. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN.
10 Description This administrative operation is applicable to a service unit, a service instance, an AMF node, a service group, an application, and the AMF cluster, that is, to all logical entities that possess an administrative state. The invocation of this administrative operation sets the administrative state of the logical entity designated by the name to which objectName points to unlocked. For more details regarding the respective status of the logical entities that results as a consequence of invoking this administrative operation on these entities, refer to Section 3.3.1.2 on page 49 (service unit), Section 3.3.3.1 on page 68 (service instance), Section 3.3.5 on page 70 (service group), Section 3.3.6.1 on page 71 (AMF node), Section 3.3.7 on page 73 (application), and Section 3.3.8 on page 74 (AMF cluster). This administrative operation can be issued on a logical entity even if one or more of the AMF nodes hosting the logical entity or parts of it are not mapped to CLM nodes, or one or more of these underlying CLM nodes are not member nodes. It can also be issued on a service unit even if it is configured but uninstantiated. If this operation is invoked on a logical entity that is already unlocked, there is no change in the status of such an entity, that is, it remains in unlocked state and the caller is returned a benign SA_AIS_ERR_NO_OP error code. If this operation is invoked on a logical entity that is locked for instantiation, there is no change in the status of such an entity, that is, it remains in the locked-instantiation state, and the caller is returned an SA_AIS_ERR_BAD_OPERATION error value. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation 40
15
20
25
30
35
322
AIS Specification
can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported by the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of the logical entity, as it is already in unlocked state. SA_AIS_ERR_BAD_OPERATION - The operation was not successful because the target entity is in locked-instantiation administrative state. See Also SA_AMF_ADMIN_LOCK, SA_AMF_ADMIN_SHUTDOWN 9.4.3 SA_AMF_ADMIN_LOCK Parameters operationId = SA_AMF_ADMIN_LOCK objectName - [in] A pointer to the name of the logical entity to be locked. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN. Description This administrative operation is applicable to a service unit, a service instance, an AMF node, a service group, an application, and the AMF cluster, that is, to all logical entities that support an administrative state. The invocation of this administrative operation sets the administrative state of the logical entity designated by the name to which objectName points to locked. For more details regarding the respective status of the logical entities that results as a consequence of invoking this administrative operation on these entities, refer to Section 3.3.1.2 on page 49 (service unit), Section 3.3.3.1 on page 68 (service instance), Section 3.3.5 on page 70 (service group), Section 3.3.6.1 on page 71 (AMF node), Section 3.3.7 on page 73 (application), and Section 3.3.8 on page 74 (AMF cluster). When a service unit or any of the entities containing the service unit is locked, and the service unit contains container components, the Availability Management Framework first performs the following actions for each container component:
10
15
20
25
30
35
40
AIS Specification
323
for each associated contained component and for each of its component service instances that has the active HA state and needs to be quiesced, the Availability Management Framework sets the HA state of the associated contained component to quiesced; the Availability Management Framework waits for each associated contained component to quiesce for its component service instances (if the setting of the HA state to quiesced was necessary), then it removes all component service instances assigned to the contained component and terminates it (see also page 65).
10
Analogously, when a service instance containing a container CSI is locked, the Availability Management Framework performs the same actions for contained components whose life cycle is being handled by the associated container component for this container CSI, before it locks the service instance. This administrative operation can be issued on a logical entity even if one or more of the AMF nodes hosting the logical entity or parts of it are not mapped to CLM nodes, or one or more of these underlying CLM nodes are not member nodes. It can also be issued on a service unit even if it is configured but uninstantiated. If this operation is invoked by a client on a logical entity that is already locked, there is no change in the status of such an entity, that is, it remains in the locked state, and a benign error value SA_AIS_ERR_NO_OP is returned to the client conveying that the entity in question designated by the name to which objectName points is already in locked state. If this operation is invoked on a logical entity that is locked for instantiation, there is no change in the status of such an entity, that is, it remains in the locked-instantiation state, and the caller is returned an SA_AIS_ERR_BAD_OPERATION error value. Chapter 10 provides sequence diagrams to illustrate the actions performed for the lock operation (Section 10.4, Section 10.5, Section 10.8, and Section 10.9). Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). 35 15
20
25
30
40
324
AIS Specification
SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported by the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of the logical entity, as it is already in locked state. SA_AIS_ERR_REPAIR_PENDING - If during the execution of this operation, certain erroneous components do not cooperate with the Availability Management Framework in carrying out the administrative operation, the Availability Management Framework tries to terminate them as part of the recovery operation before returning from the operation. If the Availability Management Framework cannot terminate these erroneous components, it will put them in the termination-failed presence state. However, the Availability Management Framework will continue the administrative operation, but will return from the call with this error value, before initiating the required repair operations for such components. The caller of the administrative operation is responsible for discovering such erroneous components and tracking the completion of the subsequent repair operations. SA_AIS_ERR_BAD_OPERATION - The operation was not successful because the target entity is in locked-instantiation administrative state. See Also SA_AMF_ADMIN_UNLOCK 9.4.4 SA_AMF_ADMIN_LOCK_INSTANTIATION Parameters operationId = SA_AMF_ADMIN_LOCK_INSTANTIATION objectName - [in] A pointer to the name of the logical entity to be locked for instantiation. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN. Description This administrative operation is applicable to a service unit, an AMF node, a service group, an application, and the AMF cluster. The invocation of this administrative operation sets the administrative state of the logical entity designated by the name to which objectName points to locked-instantiation subject to constraints described below, causing all relevant service units to become non-instantiable after their termination. For more details regarding the respective status of the logical entities that results as a consequence of invoking this administrative operation on these entities, refer to Section 3.3.1.2 on page 49 (ser-
10
15
20
25
30
35
40
AIS Specification
325
vice unit), Section 3.3.5 on page 70 (service group), Section 3.3.6.1 on page 71 (AMF node), Section 3.3.7 on page 73 (application), and Section 3.3.8 on page 74 (AMF cluster). After successful invocation of this procedure, all components in all pertinent service units are terminated; in particular, all processes in those components must cease to exist. As explained earlier, when this operation is invoked on a logical entity, all pertinent service units within its scope become non-instantiable (after being terminated), and the effect of this operation can only be reversed by applying another administrative operation designated by the operationId SA_AMF_ADMIN_UNLOCK_INSTANTIATION, which causes the relevant service units to be instantiated in a locked state, provided that the entity is not locked for instantiation at any other level, the concerned service units are pre-instantiable, and the redundancy model of the pertinent service groups allows the instantiation. Note that for non-pre-instantiable service units, the application of SA_AMF_ADMIN_LOCK_INSTANTIATION is semantically equivalent to the application of SA_AMF_ADMIN_LOCK with regards to the presence state of the service units. This administrative operation can be issued on a logical entity even if one or more of the AMF nodes hosting the logical entity or parts of it are not mapped to CLM nodes, or one or more of these underlying CLM nodes are not member nodes. It can also be issued on a service unit even if it is configured but uninstantiated. If the logical entity is unavailable during the invocation of this administrative operation, for example, if an AMF node is configured but not a member, all service units within the scope of the entity are set to non-instantiable; they can only ever again be instantiated in a locked state after another administrative operation designated by the operationId SA_AMF_ADMIN_UNLOCK_INSTANTIATION (refer to Section 9.4.5 on page 328) is invoked on the entity provided that the entity is not locked for instantiation at any other level. If this operation is invoked by a client on a logical entity that is already in lockedinstantiation state, the status of such an entity does not change, that is, the entity remains in that state, and a benign error value SA_AIS_ERR_NO_OP is returned to the client, conveying that the state of the concerned entity in question did not change. If this operation is invoked by a client on a logical entity that is either in the shuttingdown or unlocked administrative state, the status of such an entity does not change, that is, the entity remains in the respective state, and the caller is returned an SA_AIS_ERR_BAD_OPERATION error value.
10
15
20
25
30
35
40
326
AIS Specification
Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported by the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of the logical entity and it remains in the current state. SA_AIS_ERR_BAD_OPERATION - The operation was not successful because the target entity is either in the shutting-down or unlocked administrative state. SA_AIS_ERR_REPAIR_PENDING - If during the execution of this operation, certain erroneous components do not cooperate with the Availability Management Framework in carrying out the administrative operation, the Availability Management Framework tries to terminate them as part of the recovery operation before returning from the operation. If the Availability Management Framework cannot terminate these erroneous components, it will put them in the termination-failed presence state. However, the Availability Management Framework will continue the administrative operation, but will return from the call with this error value, before initiating the required repair operations for such components. The caller of the administrative operation is responsible for discovering such erroneous components and tracking the completion of the subsequent repair operations. See Also SA_AMF_ADMIN_UNLOCK_INSTANTIATION, SA_AMF_ADMIN_LOCK
10
15
20
25
30
35
40
AIS Specification
327
9.4.5 SA_AMF_ADMIN_UNLOCK_INSTANTIATION Parameters operationId = SA_AMF_ADMIN_UNLOCK_INSTANTIATION objectName - [in] A pointer to the name of the logical entity to be unlocked for instantiation. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN.
10 Description This administrative operation is applicable to a service unit, an AMF node, a service group, an application, and the AMF cluster, that is, to all logical entities that support an administrative state with a locked-instantiation value. The invocation of this administrative operation sets the administrative state of the logical entity designated by the name to which objectName points to locked. For more details regarding the respective status of the logical entities that results as a consequence of invoking this administrative operation on these entities, refer to Section 3.3.1.2 on page 49 (service unit), Section 3.3.5 on page 70 (service group), Section 3.3.6.1 on page 71 (AMF node), Section 3.3.7 on page 73 (application), and Section 3.3.8 on page 74 (AMF cluster). If the current administrative state of the target entity is locked-instantiation, the invocation of this operation on such an entity causes all of the relevant service units to become instantiable (though they remain in the locked state), provided that the concerned service units are not locked for instantiation at some other level. A subsequent invocation of the SA_AMF_ADMIN_UNLOCK administrative operation would make the relevant service units available for service instance assignments by the Availability Management Framework. This administrative operation can be issued on a logical entity even if one or more of the AMF nodes hosting the logical entity or parts of it are not mapped to CLM nodes, or one or more of these underlying CLM nodes are not member nodes. If this operation is invoked by a client on a logical entity that is already locked, the status of such an entity does not change, that is, it remains in the locked state, and a benign error value SA_AIS_ERR_NO_OP is returned to the client conveying that the entity (designated by the name to which objectName points) is already in locked state. If this operation is invoked by a client on a logical entity that is either in the shuttingdown or unlocked administrative state, the status of such an entity does not change, 35
15
20
25
30
40
328
AIS Specification
that is, the entity remains in the respective state, and the caller is returned an SA_AIS_ERR_BAD_OPERATION error value. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported by the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of the logical entity, as it is already in locked state. SA_AIS_ERR_BAD_OPERATION - The operation was not successful because the target entity is either in the shutting-down or unlocked administrative state. See Also SA_AMF_ADMIN_LOCK_INSTANTIATION, SA_AMF_ADMIN_UNLOCK 9.4.6 SA_AMF_ADMIN_SHUTDOWN Parameters operationId = SA_AMF_ADMIN_SHUT_DOWN objectName - [in] A pointer to the name of the logical entity to be shut down. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN. Description This administrative operation is applicable to a service unit, a service instance, an AMF node, a service group, an application, and the AMF cluster, that is, to all logical entities that support an administrative state.
10
15
20
25
30
35
40
AIS Specification
329
The invocation of this administrative operation sets the administrative state of the logical entity designated by the name to which objectName points to shutting-down. This administrative operation is non-blocking, that is, it does not wait for the logical entity designated by the name to which objectName points to transition to the locked state, which can possibly take a very long time. For more details regarding the respective status of the logical entities that results as a consequence of invoking this administrative operation on these entities, refer to Section 3.3.1.2 on page 49 (service unit), Section 3.3.3.1 on page 68 (service instance), Section 3.3.5 on page 70 (service group), Section 3.3.6.1 on page 71 (AMF node), Section 3.3.7 on page 73 (application), and Section 3.3.8 on page 74 (AMF cluster). When a service unit or any of the entities containing the service unit is shut down, and the service unit contains container components, the Availability Management Framework performs the following actions for each container component, before it shuts down the service unit or any of the containing entities:
x
10
15
for each associated contained component and for each of its component service instances that has the active HA state and needs to be quiesced, the Availability Management Framework sets the HA state of the associated contained component to quiescing; the Availability Management Framework waits for each associated contained component to quiesce for its component service instances (if the setting of the HA state to quiescing was necessary), then it removes all component service instances assigned to the contained component and terminates it (see also page 65). 20
25
Analogously, when a service instance containing a container CSI is shut down, the Availability Management Framework performs the same actions for contained components whose life cycle is being handled by the associated container component for this container CSI, before it shuts down the service instance. If this operation is invoked on a logical entity that is already in shutting-down administrative state, there is no change in the status of such an entity, that is, it continues shutting down, and the caller is returned a benign SA_AIS_ERR_NO_OP error value, which means that the entity is already shutting down. If this operation is invoked by a client on a logical entity that is either in locked or locked-instantiation administrative state, there is no change in the status of such an entity, that is, it remains locked or locked for instantiation, and the caller is returned an SA_AIS_ERR_BAD_OPERATION error value. Chapter 10 provides sequence diagrams to illustrate the actions performed for the shutdown operation (Section 10.1, Section 10.2, Section 10.3, and Section 10.7). 30
35
40
330
AIS Specification
Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported by the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of the logical entity, as it is already in shutting-down state. SA_AIS_ERR_BAD_OPERATION - The operation was not successful because the target entity is locked or locked for instantiation. SA_AIS_ERR_REPAIR_PENDING - If during the execution of this operation, certain erroneous components do not cooperate with the Availability Management Framework in carrying out the administrative operation, the Availability Management Framework tries to terminate them as part of the recovery operation before returning from the operation. If the Availability Management Framework cannot terminate these erroneous components, it will put them in the termination-failed presence state. However, the Availability Management Framework will continue the administrative operation, but will return from the call with this error value, before initiating the required repair operations for such components. The caller of the administrative operation is responsible for discovering such erroneous components and tracking the completion of the subsequent repair operations. See Also SA_AMF_ADMIN_LOCK, SA_AMF_ADMIN_UNLOCK
10
15
20
25
30
35
40
AIS Specification
331
9.4.7 SA_AMF_ADMIN_RESTART Parameters operationId = SA_AMF_ADMIN_RESTART objectName - [in] A pointer to the name of the logical entity to be restarted. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN.
10 Description This operation is applicable to a component, a service unit, an AMF node, an application, and the AMF cluster. This procedure typically involves a termination action followed by a subsequent instantiation of either the concerned entity or logical entities that belong to the concerned entity. This administrative operation is applicable to only those service units whose presence state is instantiated. The invocation of this administrative operation on a service unit causes the service unit to be restarted by restarting all the components within it according to the procedures defined in Section 3.12.1.2 on page 161. The decision to reassign the assigned service instances to another service unit during this operation should be determined by the Availability Management Framework based on the configured recovery policy of the components that make up the service unit. If all components within the service unit have a configured recovery policy of restart, that is, the saAmfCompDisableRestart configuration attribute of all components is set to SA_FALSE (see the SaAmfComp object class in Section 8.13.2), it is not necessary to reassign the assigned service instances; however, if at-least one component within the service unit has the saAmfCompDisableRestart configuration attribute set to SA_TRUE, a reassignment of the service instances assigned to a service unit during its restart (before termination) must be attempted by the Availability Management Framework in course of this administrative action to prevent potential service disruption. In this case, the Availability Management Framework does not set the presence state of the component to restarting and transitions through the individual terminating, terminated, instantiating, instantiated presence states instead. When this operation is invoked upon a particular instantiated component of a service unit, the other components of the service unit are not affected by this operation, that is, they are not restarted. If this operation is invoked upon an instantiated container component or upon an instantiated service unit which contains a container component, the Availability Man-
15
20
25
30
35
40
332
AIS Specification
agement Framework implicitly restarts all service units that contain contained components having this container component as the associated container component. The procedure regarding reassignment of service instances to these implicitly restarted service units is as explained above when the operation is invoked upon a service unit. When invoked upon an AMF node, an application or the AMF cluster, this action becomes a composite operation that causes a collective restart of all service units residing within the AMF node, application, or the AMF cluster. To execute such a collective restart of all service units in a particular scope, the Availability Management Framework first completely terminates all pertinent service units and does not start instantiating them back until all service units have been terminated. In the cases of application restart and AMF cluster restart, the Availability Management Framework does not perform the usual reassignment (in-order to maintain service) of service instances assigned to the various service units during the execution of the termination phase of the restart procedure. Also note that the instantiation phase of such restarts is executed in accordance with the redundancy model configuration for various service groups with no requirement to preserve pre-restart service instance assignments to various service units in the application or AMF cluster. The Availability Management Framework must not proceed with this operation if another administrative operation or an error recovery initiated by the Availability Management Framework is already engaged on the logical entity. In such case, an error value of SA_AIS_ERR_TRY_AGAIN should be returned indicating that the action is feasible but not at this instant. Section 10.10 provides a sequence diagram to illustrate the actions performed for the restart of a container component. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_BAD_OPERATION - The target logical entity for this operation identified by name to which objectName points could not be restarted for various reasons like the presence state of the service unit or the component to be restarted was not instantiated.
10
15
20
25
30
35
40
AIS Specification
333
SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported by the type of entity denoted by the name to which objectName points. SA_AIS_ERR_REPAIR_PENDING - If during the execution of this operation, certain erroneous components do not cooperate with the Availability Management Framework in carrying out the administrative operation, the Availability Management Framework tries to terminate them as part of the recovery operation before returning from the operation. If the Availability Management Framework cannot terminate or instantiate these erroneous components, it will put them in the termination-failed or instantiation-failed presence state. However, the Availability Management Framework will continue the administrative operation, but will return from the call with this error value, before initiating the required repair operations for such components. The caller of the administrative operation is responsible for discovering such erroneous components and tracking the completion of the subsequent repair operations. See Also None 9.4.8 SA_AMF_ADMIN_SI_SWAP Parameters operationId = SA_AMF_ADMIN_SI_SWAP objectName - [in] A pointer to the name of the service instance whose component service instances need to be swapped. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN. Description This administrative operation is pertinent to service instances that are currently assigned to service units. The invocation of this procedure results in swapping the HA states of the appropriate CSIs contained within an SI. The typical outcome of this operation results in the HA state of CSIs assigned to components within the service units to be interchanged; active assignments become standby and standby assignments become active. The SI identified by the name to which objectName points is called here the designated SI. If the designated SI is protected by a service group whose redundancy model is 2N, the invocation of this administrative operation causes a complete swap of all active and standby CSIs belonging to not just the designated SI but also to any other SI that
10
15
20
25
30
35
40
334
AIS Specification
is assigned to a service unit to which the designated SI is assigned. Note that this behavior is consistent with the semantics of the respective redundancy model. If the designated SI is protected by a service group whose redundancy model is N+M, the invocation of this administrative operation results in a complete swap of all active and standby CSIs belonging to not just the designated SI but also to any other SI that is assigned active to a service unit to which the designated SI is assigned active. Application of this operation on an SI may potentially modify the standby assignments of other SIs that are protected by the same service group, but are not assigned to the service unit to which the SI in question is assigned active. For an example, refer to FIGURE 14 on page 105: if the swap operation is applied on SI A, the active assignment for SI A shall be moved to Service Unit S4 on Node X, and the standby assignments for SI A as well as that of SI C and SI B will be moved to Service Unit S1 on Node U. The active assignments of SI C and SI B will remain on Service Unit 3 (on Node W) and Service Unit 2 (on Node V) respectively. In case the redundancy model of the protecting service group is N-Way, the aggregate effect of swapping all SIs assigned to a service unit by swapping only one SI is not achieved. This behavior is again consistent with the semantics of the N-Way redundancy model. In the N-Way redundancy model, it is possible that an SI has multiple standby assignments, in which case this administrative operation shall affect only the highest-ranked standby assignment. This operation must not be invoked on an SI that is protected by a service group whose redundancy model is either N-Way active or no-redundancy. If no standby assignments are available for an SI (potentially because the AMF cluster is in a degenerated status and reduction procedures have been engaged) when this operation is invoked on a particular logical entity, an error value SA_AIS_ERR_FAILED_OPERATION shall be returned. In other words, this operation shall be allowed by the Availability Management Framework to proceed under the following circumstances.
x x
10
15
20
25
30
The concerned SI is assigned active or quiescing to one service unit. The concerned SI is assigned standby to at least another service unit. 35
40
AIS Specification
335
The Availability Management Framework shall not proceed with this procedure when the presence state of the constituent service units of the service group protecting the SI is instantiating, restarting, or terminating, and should return an SA_AIS_ERR_TRY_AGAIN error value conveying that the action is valid but not currently possible. The SI-SI dependency rules and dependencies among the component service instances of the same SI must be honored, if applicable during the execution of this operation. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_BAD_OPERATION - The operation was not successful on the target SI, possibly because no standby assignments are available for the SI or the service group protecting the SI has either a no-redundancy or N-Way active redundancy model. SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported for the type of entity denoted by the name to which objectName points. SA_AIS_ERR_REPAIR_PENDING - If during the execution of this operation, certain erroneous components do not cooperate with the Availability Management Framework in carrying out the administrative operation, the Availability Management Framework tries to terminate them as part of the recovery operation before returning from the operation. If the Availability Management Framework cannot terminate these erroneous components, it will put them in the termination-failed presence state. However, the Availability Management Framework will continue the administrative operation, but will return from the call with this error value, before initiating the required repair operations for such components. The caller of the administrative operation is responsible for discovering such erroneous components and tracking the completion of the subsequent repair operations. See Also None
10
15
20
25
30
35
40
336
AIS Specification
9.4.9 SA_AMF_ADMIN_SG_ADJUST Parameters operationId = SA_AMF_ADMIN_SG_ADJUST objectName - [in] A pointer to the name of the service group that needs to be transitioned to the original preferred configuration. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN.
10 Description This operation is only relevant to a service group. This operation moves a service group to the preferred configuration, which typically causes the service instance assignments of the service units in the service group to be transferred back to the most preferred service instance assignments in which the highest-ranked available service units are assigned the active or standby HA states for those service instances. If the most preferred configuration cannot be achieved, this operation will restore the best possible configuration in which the rankings of the service units are respected with regards to active and standby service instance assignments. The objective of this administrative operation is to provide an administrator the capability to manually execute an adjust procedure, as described in Section 3.7.1.1 on page 88. This command is generally issued after the service group has undergone a series of swaps, locks, or shut-downs, and the invocation of this administrative operation brings the service group back to its initial preferred state or as close to the preferred state as possible. The Availability Management Framework shall not proceed with this procedure when the presence state of the constituent service units of the service group is instantiating, restarting, terminating, or the administrative state is shutting-down, and should return an SA_AIS_ERR_TRY_AGAIN error value conveying that the action is valid but not currently possible. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. 15
20
25
30
35
40
AIS Specification
337
SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported for the type of entity denoted by the name to which objectName points. SA_AIS_ERR_REPAIR_PENDING - If during the execution of this operation, certain erroneous components do not cooperate with the Availability Management Framework in carrying out the administrative operation, the Availability Management Framework tries to terminate them as part of the recovery operation before returning from the operation. If the Availability Management Framework cannot terminate these erroneous components, it will put them in the termination-failed presence state. However, the Availability Management Framework will continue the administrative operation, but will return from the call with this error value, before initiating the required repair operations for such components. The caller of the administrative operation is responsible for discovering such erroneous components and tracking the completion of the subsequent repair operations. See Also None
10
15
20
25
30
35
40
338
AIS Specification
9.4.10 SA_AMF_ADMIN_REPAIRED Parameters operationId = SA_AMF_ADMIN_REPAIRED objectName - [in] A pointer to the name of the logical entity to be repaired. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN.
10 Description This administrative operation is applicable to a service unit and an AMF node. This administrative operation is used to clear the disabled operational state of an AMF node or a service unit after they have been successfully mended to declare them as repaired. The administrator uses this command to indicate the availability of a service unit or an AMF node for providing service after an externally executed repair action. When invoked on an AMF node, this operation results in enabling the operational state of the constituent service units and components. When invoked on a service unit, it has similar effect on all the components that make up the service unit. An AMF node or a service unit enters the disabled operational state due to reasons stated in Section 3.3.6.2 on page 72 (AMF node) and Section 3.3.1.3 on page 50 (service unit). The Availability Management Framework might optionally engage in repairing an AMF node or a service unit after a successful recovery procedure execution, in which case the Availability Management Framework itself will clear the disabled state of the involved AMF node or service unit. However, if the repair action is undertaken by an external entity outside the scope of the Availability Management Framework, or the Availability Management Framework failed to successfully repair (and the repair requires intervention by an external entity), one should use this administrative operation to clear the disabled state of the AMF node or the service unit to indicate that these entities are repaired and their operational state is enabled. It is expected that a repair done by an external entity should bring the repaired service units and components in a consistent state, that is, to either the instantiated or the uninstantiated presence state, before an SA_AIS_OK status is returned by this operation. This administrative operation can be issued on an AMF node or on a service unit hosted by an AMF node even if this AMF node is not mapped to a CLM node, or the underlying CLM node is not a member node. It can also be issued on a service unit even if it is configured but uninstantiated. 15
20
25
30
35
40
AIS Specification
339
If this administrative operation is invoked on an AMF node or a service unit whose operational state is already enabled, the entity remains in that state, and a benign error value of SA_AIS_ERR_NO_OP is returned to the caller. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported by the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of the logical entity, as it is already enabled. SA_AIS_ERR_BAD_OPERATION - The operation could not ensure that the presence states of the relevant service units and components are either instantiated or uninstantiated.
10
15
20
25 See Also None 9.4.11 SA_AMF_ADMIN_EAM_START Parameters operationId = SA_AMF_ADMIN_EAM_START objectName - [in] A pointer to the name of the logical entity on which external active monitoring needs to be started. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN. Description This administrative operation applies to a component and a service unit. 40 35 30
340
AIS Specification
This API function is invoked to resume external active monitoring of components after it has been stopped by invoking the administrative operation designated by operationId = SA_AMF_ADMIN_EAM_STOP on the same component. If a component on which this administrative operation is invoked is already being actively monitored, there is no change in its status as a consequence of invoking this operation on such a component. A status of SA_AIS_ERR_NO_OP is returned in such a case. When this procedure is applied to a service unit, it results in an aggregate action of starting the external active monitors for all components within the service unit that support external active monitoring without affecting the ones that are already being actively monitored. If the external monitors for all components within the enclosing service unit that support external active monitoring have been already started, an SA_AIS_ERR_NO_OP error code is returned to indicate that there has been no change in the status of active monitoring of the components within the service unit. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_FAILED_OPERATION - The AM_START operation returns an error or fails to complete within the configured timeout. SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported for the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of active monitoring of the logical entity. See Also SA_AMF_ADMIN_EAM_STOP
10
15
20
25
30
35
40
AIS Specification
341
9.4.12 SA_AMF_ADMIN_EAM_STOP Parameters operationId = SA_AMF_ADMIN_EAM_STOP objectName - [in] A pointer to the name of the logical entity on which external active monitoring needs to be stopped. The name is expressed as a LDAP DN. The type of the logical entity is inferred by parsing this DN.
10 Description This administrative operation applies to a component and a service unit. This API function is typically invoked to stop external active monitoring of components before terminating them. If a component on which this administrative operation is invoked is not being actively monitored, there is no change in its status as a consequence of invoking this operation on such a component. A status of SA_AIS_ERR_NO_OP is returned in such a case. When this procedure is applied to a service unit, it results in an aggregate action of stopping the external active monitors for all components within the service unit that support external active monitoring without affecting the ones that are not being actively monitored. If the external monitors for all components within the enclosing service unit that support external active monitoring have been already stopped, an SA_AIS_ERR_NO_OP error code is returned to indicate that there has been no change in the status of active monitoring of the components within the service unit. Return Values SA_AIS_OK - The function completed successfully. SA_AIS_ERR_LIBRARY - An unexpected problem occurred in the library (such as corruption). The library cannot be used anymore. SA_AIS_ERR_TIMEOUT - An implementation-dependent timeout occurred. It is unspecified whether the call succeeded or whether it did not. SA_AIS_ERR_TRY_AGAIN - The service cannot be provided at this time. The client may retry later. This error should be generally returned when the requested action is valid but not currently possible, probably because another operation is acting upon the logical entity on which the administrative operation is invoked. Such an operation can be another administrative operation or an error recovery initiated by the Availability Management Framework. 35 30 15
20
25
40
342
AIS Specification
SA_AIS_ERR_NO_RESOURCES - There are insufficient resources (other than memory). SA_AIS_ERR_FAILED_OPERATION - The AM_STOP operation returns an error or fails to complete within the configured timeout. SA_AIS_ERR_NOT_SUPPORTED - This administrative procedure is not supported for the type of entity denoted by the name to which objectName points. SA_AIS_ERR_NO_OP - The invocation of this administrative operation has no effect on the current state of active monitoring of the logical entity. See Also SA_AMF_ADMIN_EAM_START
10
15
20
25
30
35
40
AIS Specification
343
10
15
20
25
30
35
40
344
AIS Specification
10
15 Comp1 Comp2 20
gracefully quiescing CSI work assignment
25
3)saAmfCSIQuiescingComplete(CSI1) 4)saAmfHAStateGet(CSI1) 5)return HA state = quiesced 6)saAmfCSIRemoveCallback(CSI1,Comp1) 7)saAmfResponse(CSI1,SA_AIS_OK) 8)saAmfCSIRemoveCallback(CSI1,Comp2) 9)saAmfResponse(CSI1,SA_AIS_OK) 40 The dotted lines indicate optional transactions. 35 30
AIS Specification
SAI-AIS-AMF-B.03.01 Section 10
345
The result of a complete transition from the quiescing HA state is to arrive at the quiesced HA state. Notice that since only one of the SIs has been shut down, the component service instance corresponding to that SI (CSI1) is manipulated and the other (CSI2) is left alone. Further notice that the Availability Management Framework does not remove the standby state for CSI1 from Comp2 until the active HA state of Comp1 for CSI1 has transitioned successfully to quiesced. At this time, the Availability Management Framework can remove the CSI1 assignment from Comp1 and Comp2 in any order.
10
15
20
25
30
35
40
346
AIS Specification
10
Comp1
Comp2 15
CSI 1 and CSI 2 must both quiesce before QuiescingComplete can be issued. 3)saAmfCSIQuiescingComplete(CSI1&2) 4)saAmfCSISetCallback(CSI1&2,active) 5)saAmfResponse(CSI1&2,SA_AIS_OK) 6)saAmfCSIRemoveCallback(CSI1&2,Comp1) 7)saAmfResponse(CSI1&2,SA_AIS_OK)
20
25
30
35
The Availability Management Framework should use the csiFlags value SA_AMF_TARGET_ALL in (callback) steps 1, 4, and 6 in order to guarantee that 2N semantics are honored. Those semantics are ...at most one service unit will have the active HA state for all service instances, and at most one service unit will have the standby HA state for all service instances.
40
AIS Specification
347
Notice that saAmfCSIQuiescingComplete() can only be invoked when all component service instance assignments have successfully quiesced within the component.
10
15
20
25
30
35
40
348
AIS Specification
1
FIGURE 41 Administrative Shutdown of a Service Unit for the N-Way Case
AMF 1)saAmfCSISetCallback(CSI1,quiescing) 2)saAmfCSISetCallback(CSI2,quiescing) 3)saAmfResponse(CSI1,SA_AIS_OK) 4)saAmfResponse(CSI2,SA_AIS_OK) 5)saAmfCSIQuiescingComplete(CSI1) 6)saAmfCSISetCallback(CSI1,active) 7)saAmfResponse(CSI1,SA_AIS_OK) 8)saAmfCSIRemoveCallback(CSI1,Comp1) 9)saAmfResponse(CSI1,SA_AIS_OK) 10)saAmfCSIQuiescingComplete(CSI2) 11)saAmfCSISetCallback(CSI2,active) 12)saAmfResponse(CSI2,SA_AIS_OK) 13)saAmfCSIRemoveCallback(CSI2,Comp1) 14)saAmfResponse(CSI2,SA_AIS_OK)
Comp1
Comp2
10
15
20
25
30
35 Note that Comp2 will have both active and standby assignments for a certain period of time, which implies that Comp2 must have the X_active_and_Y_standby capability. Also notice that CSI2 at Comp1 has taken much longer to quiesce (from step 2 to step 10) while CSI1 at Comp1 quiesced much faster (from step 1 to step 5) allowing the Availability Management Framework to proceed with the active HA assignment for CSI1 to Comp2. 40
AIS Specification
349
10 Comp1 Comp2
15
20
Notice that since only one of the SIs has been locked, only the component service instance corresponding to that SI is manipulated.
25
30
35
40
350
AIS Specification
10
AMF 1)saAmfCSISetCallback(CSI1,quiesced) 2)saAmfCSISetCallback(CSI2,quiesced) 3)saAmfResponse(CSI1,SA_AIS_OK) 4)saAmfResponse(CSI2,SA_AIS_OK) 5)saAmfCSISetCallback(CSI1,active) 6)saAmfResponse(CSI1,SA_AIS_OK) 7)saAmfCSISetCallback(CSI2,active) 8)saAmfResponse(CSI2,SA_AIS_OK) 9)saAmfCSIRemoveCallback(CSI1,Comp1) 10)saAmfCSIRemoveCallback(CSI2,Comp1)
Comp1
Comp2 15
20
25
30
35 11)saAmfResponse(CSI1,SA_AIS_OK) 12)saAmfResponse(CSI2,SA_AIS_OK) 40
AIS Specification
351
Note that the same sequence applies when a service unit is locked as a consequence of a node lock administrative action. In the example, it is assumed that the other service unit in the service group resides on another node.
10
The dotted line indicates an optional transaction. Note that the protection group callback informs the registered component that Comp1 exited from the protection group. 30
35
40
352
AIS Specification
It has an N-way active redundancy model and contains the service units ContainerSU1 and ContainerSU2, which are configured on Node1 and Node2 respectively. ContainerSU1 contains the component Container1, and ContainerSU2 contains Container2. The service instance ContainerSI1 is assigned to the service group. ContainerSI1 contains ContainerCSI1.
10
15
x x
It has a 2N redundancy model and contains the service units SU1 and SU2, which are configured on Node1 and Node2 respectively. SU1 contains the component C1, and SU2 contains C2. The service instance SI1 is assigned to the service group. SI1 contains the component service instance CSI1. In the Availability Management Framework configuration, C1 and C2 are configured with saAmfCompContainerCsi set to ContainerCSI1. C1 and C2 are configured with saAmfCompContainerCsi set to ContainerCSI1.
20
25
30
35
40
AIS Specification
353
1
FIGURE 45 Scenario for Shutting Down a Service Instance Having a Container CSI
C1
C2
15
20
active standby
CSI1
active
SI1
active
25
ContainerCSI1 ContainerSI1 The following sequence diagram shows the sequence of operations when ContainerSI1 is shut down. For readability purposes, the invocations of saAmfResponse() to respond to Availability Management Framework requests are not shown in the diagram. Before the shutdown administrative operation is issued, the state is as follows:
x x
30
35
Container1 and Container2 have the active HA state for ContainerCSI1. Container1 handles the life cycle of C1, and Container2 handles the life cycle of C2. For CSI1, C1 has the active HA state and C2 has the standby HA state.
40
354
AIS Specification
1
FIGURE 46 Administrative Shutdown of a Service Instance Having a Container CSI
AMF
1)saAmfCSISetCallback(CSI1,quiescing) 2)saAmfCSIQuiescingComplete(CSI1) 3)saAmfCSIRemoveCallback (C2,CSI1) 4)saAmfCSIRemoveCallback (C1,CSI1)
Container1
C1
Container2
C2 5
10
5)saAmfComponentTerminateCallback (C1) 6)saAmfComponentTerminateCallback (C2) 7) saAmfCSISetCallback (ContainerCSI1,quiescing) 8)saAmfCSIQuiescingComplete(ContainerCSI1) 9) saAmfCSIRemoveCallback (Container1,ContainerCSI1) 10) saAmfCSISetCallback (ContainerCSI1,quiescing) 11)saAmfCSIQuiescingComplete (ContainerCSI1) 12) saAmfCSIRemoveCallback (Container2,ContainerCSI1)
15
20
25
C1 is set to quiescing for CSI1. After the quiescing is completed, CSI1 is removed from C1 and C2. C1 and C2 are terminated. Container1 is set to quiescing for ContainerCS1; after the quiescing is completed, ContainerCSI1 is removed from Container1. Container2 is set to quiescing for ContainerCS1; after the quiescing is completed, ContainerCSI1 is removed from Container2.
35
40
AIS Specification
355
10
15
Container1 and Container2 have the active HA state for ContainerCSI1. Container1 handles the life cycle of C1, and Container2 handles the life cycle of C2. For CSI1, C1 has the active HA state and C2 has the standby HA state. 20
FIGURE 47
AMF
1)saAmfCSIRemoveCallback (C1,CSI1)
Container1
C1
Container2
C2
25
30
35
5) saAmfCSIRemoveCallback (Container1,ContainerCSI1) 6) saAmfCSIRemoveCallback (Container2,ContainerCSI1)
40
356
AIS Specification
CSI1 is removed from C1, and C1 is terminated. CSI1 is removed from C2, and C2 is terminated. ContainerCSI1 is removed from C1 and from C2. 5
10
15
20
25
30
35
40
AIS Specification
357
It has a N-way active redundancy model and contains the service units ContainerSU1, ContainerSU2, and ContainerSU3, which are configured on Node1, Node2, and Node3 respectively. ContainerSU1 contains the component Container1, ContainerSU2 contains Container2, and ContainerSU3 contains Container3. The service instance ContainerSI1 is assigned to the service group. ContainerSI1 contains the component service instance ContainerCSI1.
10
15
x x
It has a 2+1 (N+M) redundancy model and contains the service units SU1, SU2, and SU3, which are configured on Node1, Node2, and Node3 respectively. SU1 contains the component C1, SU2 contains C2, and SU3 contains C3. The service instances SI1 and SI2 are assigned to the service group. SI1 contains the component service instance CSI1, and SI2 contains the component service instance CSI2. C1, C2, and C3 are configured with saAmfCompContainerCsi set to ContainerCSI1.
20
25
30
35
40
358
AIS Specification
1
FIGURE 48 Scenario for Locking a Service Unit Containing a Container Component
C1
C2
C3
15
active
standby
active
standby
20
CSI1 SI1
active
CSI2 SI2
active active
25
ContainerCSI1 ContainerSI1 30
The following sequence diagram shows the sequence of operations when ContainerSU1 is locked. Before the lock administrative operation is issued, the state is as follows:
x
35
Container1, Container2, and Container3 have the active HA state for ContainerCSI1. Container1, Container2, and Container3 handle the life cycle of C1, C2, and C3 respectively. C1 has the active HA state for CSI1, C2 has the active HA state for CSI2, and C3 has the standby HA state for CSI1 and CSI2.
40
AIS Specification
359
1
FIGURE 49 Administrative Lock of a Service Unit Containing a Container Component
AMF
1)saAmfCSISetCallback(CSI1,quiesced) 2)saAmfResponse(CSI1,SA_AIS_OK ) 3)saAmfCSIRemoveCallback (C3,CSI2) 4)saAmfResponse(C3,SA_AIS_OK ) 5)saAmfCSISetCallback(CSI1,active) 6)saAmfResponse(CSI1,SA_AIS_OK ) 7)saAmfCSIRemoveCallback (C1,CSI1) 8)saAmfResponse(C1,SA_AIS_OK )
Container1
C1
C3 5
10
15
20
25
30
13)saAmfCSIRemoveCallback(Container1,ContainerCSI1) 14)saAmfResponse(Container1,SA_AIS_OK )
C1 is quiesced for CSI1. The standby HA state for CSI2 is removed from C3. C3 is set active for CSI1. CSI1 is removed from C1.
40
360
AIS Specification
x x x
10
20
25
30
35
40
AIS Specification
361
1
FIGURE 50 Restart of a Container Component
Container1
C1 5
10
15 6)saAmfRegister(Container1) 7)saAmfCSISetCallback(ContainerCSI1,active) 8) saAmfResponse(ContainerCSI1,SA_AIS_OK) 9) saAmfContainedComponentInstantiateCallback (C1) 10) instantiate C1 25 11) saAmfResponse(C1,SA_AIS_OK) 12)saAmfRegister(C1) 13)saAmfCSISetCallback(CSI1,active) 14) saAmfResponse(CSI1,SA_AIS_OK) 35 The main transitions are:
x x
20
30
The contained component C1 is terminated; then, Container1 is terminated. The Availability Management Framework runs the INSTANTIATE command to instantiate Container1. Container1 registers with the Availability Management Framework.
40
362
AIS Specification
The Availability Management Framework assigns Container1 active for ContainerCSI1. Container1 responds to the Availability Management Framework that it is ready to provide service for ContainerCSI1. The Availability Management Framework invokes the saAmfContainedComponentInstantiateCallback() callback function of Container1 to instantiate C1. Container1 instantiates C1 through a private interface. C1 registers with the Availability Management Framework. The Availability Management Framework assigns C1 active for CSI1.
x x x
10
15
20
25
30
35
40
AIS Specification
363
10
15
20
25
30
35
40
364
AIS Specification
its operational and functional state and the operational and functional state of the objects under its control
to an administrator or a management system. These reports vary in perceived severity and include alarms, which potentially require an operator intervention, and notifications that signify important state or object changes. A management entity should regard notifications, but they do not necessarily require an operator intervention. The recommended vehicle to be used for producing alarms and notifications is the Notification Service of the Service AvailabilityTM Forum (abbreviated as NTF, see [2]), and hence the various notifications are partitioned into categories, as described in this service. In some cases, this specification uses the word Unspecified for values of attributes that the vendor is at liberty to set to whatever makes sense in the vendors context, and the SA Forum has no specific recommendation regarding such values. Such values are generally optional from the CCITT Recommendation X.733 perspective (see [8]). 10
15
20
25
30
Correlation Ids - They are supplied to correlate two notifications that have been generated because of a related cause. This attribute is optional. However, in case of alarms that are generated to clear certain conditions, that is, produced with a perceived severity of SA_NTF_SEVERITY_CLEARED, the correlation id shall be populated by the application with the notification id that was generated by the Notification Service when invoking the saNtfNotificationSend() API during the production of the actual alarm. Event Time - The application might pass a timestamp or optionally pass an SA_TIME_UNKNOWN value in which case the timestamp is provided by the Notification Service.
35
40
AIS Specification
SAI-AIS-AMF-B.03.01 Section 11
365
NCI Id - The vendorId portion of the SaNtfClassIdT data structure must be set to SA_NTF_VENDOR_ID_SAF always. The majorId and minorId will vary based on the specific SA Forum service and the particular notification. Every SA Forum service shall have a majorId as described in the enumeration SaNtfSafServicesT of the Notification Service specification. The minorIds will be described and reused on a per-service basis. Notification Id - This attribute is obtained from the Notification Service when a notification is generated, and hence need not be populated by an application. Notifying Object - DN of the entity generating the notification. This name must conform to the SA Forum AIS naming convention and must contain at least the safApp RDN value portion of the DN set to the specified standard RDN value of the SA Forum AIS Service generating the notification. For details on the AIS naming convention, refer to the Overview document.
10
15
20
25
30
Alarms: (0x010x64) State change notifications: (0x650xC8) Object change notifications: (0xC90x12C) Attribute change notifications: (0x12D0x190)
35
40
366
AIS Specification
Description The Availability Management Framework is currently unable to provide service or is in a degraded state because of certain issues with memory, resources, communication, or other constraints. Clearing Method 1) Manual, after taking the appropriate administrative action or 2) issue an implementation-specific optional alarm with perceived severity SA_NTF_SEVERITY_CLEARED to convey that the Availability Management Framework self-healed or recovered and is again providing service.
10
15
20
25
30
35
40
AIS Specification
367
1 Table 22 Availability Management Framework Service Impaired Alarm NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Probable Cause Specific Problems Perceived Severity Trend Indication Threshold Information Monitored Attributes Proposed Repair Actions Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional Optional Mandatory Optional Mandatory Optional Optional Optional Optional SA Forum-Recommended Value SA_NTF_ALARM_COMMUNICATION AMF Service, same as Notifying object, as specified above. minorId = 0x01 AMF service impaired. Unspecified Applicable value from enum SaNtfProbableCauseT in [2] Unspecified Applicable value from enum SaNtfSeverityT in [2] Unspecified Unspecified Unspecified Unspecified 30 25 20 10 5
15
35
40
368
AIS Specification
Description The Availability Management Framework was unable to successfully instantiate a particular component. This means that
x
either the INSTANTIATE command executed on the component either returned an error exit status or failed to successfully complete within the time period specified by the configured timeout, or the corresponding callback invoked on the component or on its proxy or associated container component returned an error code other than SA_AIS_OK or failed to successfully complete within the configured timeout, and and all subsequent attempts by the Availability Management Framework to revive the component, including a possible node reboot, did not resolve the issue. 10
15
As a consequence, the component will enter the instantiation-failed presence state. For more details, refer to Section 4.6. Clearing Method Manual, after taking the appropriate administrative action. 20
25
30
35
40
AIS Specification
369
1 Table 23 Component Instantiation Failed Alarm NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional SA Forum Mandatory SA Forum-Recommended Value SA_NTF_ALARM_PROCESSING LDAP DN of the component whose instantiation failed minorId = 0x02 Instantiation of Component <LDAP DN of component> failed infoId = SA_AMF_NODE_NAME, infoType = SA_NTF_VALUE_LDAP_NAME, infoValue = LDAP DN of node on which the component is hosted. Applicable value from enum SaNtfProbableCauseT in [2] Unspecified Applicable value from enum SaNtfSeverityT in [2] Unspecified Unspecified Unspecified Unspecified 35 30 25 10 5
15
20
Probable Cause Specific Problems Perceived Severity Trend Indication Threshold Information Monitored Attributes Proposed Repair Actions
40
370
AIS Specification
Description The Availability Management Framework was unable to successfully cleanup a particular component after failing to successfully terminate the component. Under such circumstances, the component enters the termination-failed presence state. This condition could potentially cause a service disruption, as the workload (assigned to the failed component) would not be reassigned to some other healthy component because of redundancy model constraints, requiring an administrator to take a corrective action in order to recover. For more details, refer to Section 4.8. Clearing Method Manual, after taking the appropriate administrative action. 15 5
10
20
25
30
35
40
AIS Specification
371
1 Table 24 Component Cleanup Failed Alarm NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional SA Forum Mandatory SA Forum-Recommended Value SA_NTF_ALARM_PROCESSING LDAP DN of the component whose cleanup failed minorId = 0x03 Cleanup of Component <LDAP DN of component> failed infoId = SA_AMF_NODE_NAME, infoType = SA_NTF_VALUE_LDAP_NAME, infoValue = LDAP DN of node on which the component is hosted. Applicable value from enum SaNtfProbableCauseT in [2] Unspecified Applicable value from enum SaNtfSeverityT in [2] Unspecified Unspecified Unspecified Unspecified 35 30 25 10 5
15
20
Probable Cause Specific Problems Perceived Severity Trend Indication Threshold Information Monitored Attributes Proposed Repair Actions
40
372
AIS Specification
Description A component failed and recommended to the Availability Management Framework the SA_AMF_CLUSTER_RESET cluster reset recovery action. Clearing Method 1) Manual, after taking the appropriate administrative action or 2) issue an implementation-specific optional alarm with perceived severity SA_NTF_SEVERITY_CLEARED to convey that the cluster reset was successful. 10 5
Table 25 Cluster Reset Triggered by a Component Failure Alarm NTF Attribute Name Event Type Notification Object Attribute Type (NTFRecommended Value) Mandatory Mandatory SA Forum-Recommended Value SA_NTF_ALARM_PROCESSING LDAP DN of the component that recommended an SA_AMF_CLUSTER_RESET recovery minorId = 0x04 Failure of Component <LDAP DN of component> triggered cluster reset. Unspecified Applicable value from enum SaNtfProbableCauseT in [2] Unspecified Applicable value from enum SaNtfSeverityT in [2] Unspecified Unspecified Unspecified Unspecified
15
20
Notification Class Identifier Additional Text Additional Information Probable Cause Specific Problems Perceived Severity Trend Indication Threshold Information Monitored Attributes Proposed Repair Actions
NTF internal Optional Optional Mandatory Optional Mandatory Optional Optional Optional Optional
25
30
35
40
AIS Specification
373
Description A particular unit of work indicated by a service instance has no active assignments to any service unit, which is potentially causing a service disruption. In other words, the service instance transitioned to the unassigned assignment state, as explained in Section 3.3.3.2. This alarm is typically generated when the Availability Management Framework is unable to successfully execute a recovery to prevent the service disruption and maintain service availability in case of a failure (node, service unit, and so on). This alarm should be also generated when an administrative action renders a service instance unassigned. Clearing Method Manual, after taking the appropriate administrative action. 5
10
15
20
25
30
35
40
374
AIS Specification
1 Table 26 Service Instance Unassigned Alarm NTF Attribute Name Event Type Notification Object Attribute Type (NTFRecommended Value) Mandatory Mandatory SA Forum-Recommended Value SA_NTF_ALARM_PROCESSING LDAP DN of the service instance that has no current active assignments. minorId = 0x05 SI designated by <LDAP DN of the SI> has no current active assignments to any SU. Unspecified Applicable value from enum SaNtfProbableCauseT in [2] Unspecified Applicable value from enum SaNtfSeverityT in [2] Unspecified Unspecified Unspecified Unspecified 30 20 15 10 5
Additional Information Probable Cause Specific Problems Perceived Severity Trend Indication Threshold Information Monitored Attributes Proposed Repair Actions
25
35
40
AIS Specification
375
Description This alarm is generated by the Availability Management Framework when it reliably confirms that a component that was previously being proxied has currently no proxy component mediating for it, that is, the Availability Management Framework has not been able to engage another component to assume the mediation responsibility for a component whose proxy component has failed. The proxied component has now the SA_AMF_PROXY_STATUS_UNPROXIED status, as defined in Section 7.4.4.7. See also Section 11.2.2.6. Clearing Method Manual, after taking the appropriate administrative action. Table 27 Proxy Status of a Component Changed to Unproxied Alarm NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Probable Cause Specific Problems Perceived Severity Trend Indication Threshold Information Monitored Attributes Proposed Repair Actions Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional Optional Mandatory Optional Mandatory Optional Optional Optional Optional SA Forum-Recommended Value SA_NTF_ALARM_PROCESSING LDAP DN of component that is no longer proxied. minorId = 0x06 Unspecified Unspecified Applicable value from enum SaNtfProbableCauseT in [2] Unspecified Applicable value from enum SaNtfSeverityT in [2] Unspecified Unspecified Unspecified Unspecified 40 35 30 25 20 15 5
10
376
AIS Specification
Description The administrative state of a node, a service unit, a service group, a service instance, an application, or the cluster changed.
Table 28 Administrative State Change Notification NTF Attribute Name Event Type Notification Object Notification Class Identifier Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal SA Forum-Recommended Value SA_NTF_OBJECT_STATE_CHANGE LDAP DN of the logical entity whose administrative state changed minorId = 0x65 for Node minorid = 0x66 for SU minorid = 0x67 for SG minorid = 0x68 for SI minorid = 0x69 for Application minorid = 0x6A for Cluster. Unspecified Unspecified SA_NTF_MANAGEMENT_OPERATION SA_AMF_ADMIN_STATE Applicable value from enum SaAMFAdminStateT Applicable value from enum SaAMFAdminStateT
10
15
20
25
Additional Text Additional Information Source Indicator Changed State Attribute ID Old Attribute Value New Attribute Value
30
35
40
AIS Specification
377
Table 29 Operational State Change Notification NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional Optional SA Forum-Recommended Value SA_NTF_OBJECT_STATE_CHANGE LDAP DN of the logical entity whose operational state changed minorId = 0x6B for Node minorid = 0x6C for SU Unspecified infoId = SA_AMF_MAINTENANCE_CAMPAIGN_DN infoType = SA_NTF_VALUE_LDAP_NAME infoValue = LDAP DN of the upgrade campaign, that is, the contents of the saAmfSUMaintenanceCampaign attribute. SA_NTF_OBJECT_OPERATION or SA_NTF_UNKNOWN_OPERATION SA_AMF_OP_STATE Applicable value from enum SaAmfOperationalStateT Applicable value from enum SaAmfOperationalStateT 40 20 15 10
25
Source Indicator Changed State Attribute ID Old Attribute Value New Attribute Value
30
35
378
AIS Specification
Description The presence state change of a service unit is reported only if it becomes instantiated, uninstantiated, or restarting. 5
Table 30 Presence State Change Notification NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Source Indicator Changed State Attribute ID Old Attribute Value New Attribute Value Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional Optional Mandatory Optional Optional Mandatory 10 SA Forum-Recommended Value SA_NTF_OBJECT_STATE_CHANGE LDAP DN of the service unit whose presence state changed minorId = 0x6D Unspecified Unspecified SA_NTF_OBJECT_OPERATION or SA_NTF_UNKNOWN_OPERATION SA_AMF_PRESENCE_STATE Unspecified Applicable value from enum SaAmfPresenceStateT 30 25 20 15
35
40
AIS Specification
379
Description The HA state of a service unit changes for an assigned service instance. 5
Table 31 HA State Change Notification NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional SA Forum Mandatory SA Forum-Recommended Value SA_NTF_OBJECT_STATE_CHANGE LDAP DN of the service unit whose HA state changed on behalf of a particular SI. minorId = 0x6E The HA state of SI <LDAP DN> assigned to SU <LDAP DN> changed. infoId = SA_AMF_SI_NAME, infoType = SA_NTF_VALUE_LDAP_NAME, infoValue = LDAP DN of the SI that was assigned to the SU whose HA state changed. SA_NTF_OBJECT_OPERATION or SA_NTF_UNKNOWN_OPERATION SA_AMF_HA_STATE Unspecified Applicable value from enum SaAmfHAStateT 35 30 15 10
20
25
Source Indicator Changed State Attribute ID Old Attribute Value New Attribute Value
40
380
AIS Specification
Description The assignment state of a service instance changed. This notification is generated for all assignment state transitions for a service instance, except when the assignment state changes to SA_AMF_ASSIGNMENT_UNASSIGNED in which case an alarm is generated, as explained in Section 11.2.1.5 on page 374. 5
Table 32 SI Assignment State Change Notification NTF Attribute Name Event Type Notification Object Notification Class Identifier Additional Text Additional Information Source Indicator Changed State Attribute ID Old Attribute Value New Attribute Value Attribute Type (NTFRecommended Value) Mandatory Mandatory NTF internal Optional Optional Mandatory Optional Optional Mandatory SA Forum-Recommended Value SA_NTF_OBJECT_STATE_CHANGE LDAP DN of the service instance whose assignment state changed. minorId = 0x6F The Assignment state of SI <LDAP DN of SI> changed. Unspecified SA_NTF_OBJECT_OPERATION or SA_NTF_UNKNOWN_OPERATION SA_AMF_ASSIGNMENT_STATE Applicable value from enum SaAmfAssignmentStateT Applicable value from enum SaAmfAssignmentStateT
10
15
20
25
30
35
40
AIS Specification
381
Description This notification is generated by the Availability Management Framework when it could engage another component to assume the mediation responsibility for a proxied component which was in the SA_AMF_PROXY_STATUS_UNPROXIED status (see Section 7.4.4.7). The proxied component assumes then the SA_AMF_PROXY_STATUS_PROXIED status. See also Section 11.2.1.6. 5
10
Table 33 Proxy Status of a Component Changed to Proxied Notification NTF Attribute Name Event Type Notification Object Attribute Type (NTFRecommended Value) Mandatory Mandatory SA Forum-Recommended Value SA_NTF_OBJECT_STATE_CHANGE LDAP DN of the proxied component whose proxy failed and is currently not being proxied. minorId = 0x70 Unspecified Unspecified SA_NTF_OBJECT_OPERATION or SA_NTF_UNKNOWN_OPERATION SA_AMF_PROXY_STATUS SA_AMF_PROXY_STATUS_UNPROXIED SA_AMF_PROXY_STATUS_PROXIED 35 30 25 20 15
Notification Class Identifier Additional Text Additional Information Source Indicator Changed State Attribute ID Old Attribute Value New Attribute Value
40
382
AIS Specification
Table 34 Implementation of CLC Operations for Each Component Category Component Category SA-aware (excluding contained component) contained component Operation instantiate terminate cleanup instantiate terminate cleanup proxied, preinstantiable instantiate terminate cleanup proxied, non-preinstantiable instantiate terminate cleanup non-proxied, non-SAaware instantiate terminate cleanup Implementation CLC-CLI INSTANTIATE saAmfComponentTerminateCallback() CLC-CLI CLEANUP saAmfContainedComponentInstantiateCallback() saAmfComponentTerminateCallback() saAmfContainedComponentCleanupCallback() saAmfProxiedComponentInstantiateCallback() saAmfComponentTerminateCallback() CLC-CLI CLEANUP (if local) saAmfProxiedComponentCleanupCallback() saAmfCSISetCallback() saAmfCSIRemoveCallback() CLC-CLI CLEANUP (if local) saAmfProxiedComponentCleanupCallback() CLC-CLI INSTANTIATE CLC-CLI TERMINATE CLC-CLI CLEANUP
10
15
20
25
30
35
If both an saAmfProxiedComponentCleanupCallback() callback and a CLEANUP command are defined for local components, the former is executed. Only if errors occur during that operation, is the CLEANUP command run.
40
AIS Specification
383
10
15
20
25
30
35
40
384
AIS Specification
10
15
20
25
30
35
40
AIS Specification
385
Table 35 API Functions in Unregistered Processes (Continued) API Interfaces saAmfHealthcheckConfirm() saAmfHealthcheckStart() saAmfHealthcheckStop() saAmfInitialize_3() saAmfPmStart_3() saAmfPmStop() SaAmfProtectionGroupTrackCallbackT saAmfProtectionGroupTrack() saAmfProtectionGroupTrackStop() SaAmfProxiedComponentCleanupCallbackT SaAmfProxiedComponentInstantiateCallbackT saAmfResponse() saAmfSelectionObjectGet() All SA Forum API functions and callbacks API Can be Invoked in the Context of an Unregistered Process YES YES YES YES YES YES YES YES YES NO NO YES YES YES
10
15
20
25
30
35
40
386
AIS Specification
10
has a 2+1 (N+M) redundancy model, and contains the service units SUx1, SUx2, and SUx3. SUx1 contains component cx1, SUx2 contains component cx2, and SUx3 contains component Cx3. CSIs corresponding to the components cx1, cx2, and cx3 are CSIx1, CSIx2 and CSIx3 respectively. 15
Proxy SGp1:
x x x x x
has a 2N redundancy model, and contains the service units SUp1 and SUp2. SUp1 contains component cp1 and SUp2 contains component cp2. The CSI corresponding to the components is CSIp1 (Proxy CSI) There is only a single SI, SIx1, which is protected by this service group.
20
25
The Availability Management Framework configuration will have the following CSI associations for the proxied components in SGx1:
x x x
cx1 should be proxied by CSIp1 cx2 should be proxied by CSIp1 cx3 should be proxied by CSIp1
30
When the Availability Management Framework instantiates SGp1, it may decide by some logic that CSIp1 should be assigned active to cp1 and standby to cp2. The decision is based on the configuration data and HA requirements; the fact that CSIp1 is a proxy CSI is not taken into account during its decision; however, when CSIp1 is assigned to cp1 as active, the Availability Management Framework has the following information at that time:
x
35
40
CSIp1 is associated with the proxied components cx1, cx2, and cx3. This information is derived from the configuration.
AIS Specification
387
Hence, the Availability Management Framework concludes that cp1 is currently supposed to proxy proxied components cx1, cx2, and cx3, and it starts instantiating them. The following steps illustrate an instantiation sequence for this sample configuration when cx1 and cx2 are instantiated and registered but cx3 does not register (potentially because of a failure).
1. 2. 3. 4.
AMF runs the INSTANTIATE command to instantiate cp1. cp1 registers with AMF by invoking saAmfComponentRegister(). AMF assigns cp1 active for CSIp1 by invoking SaAmfCSISetCallbackT. AMF invokes SaAmfProxiedComponentInstantiateCallbackT for cx1 on cp1. cp1 registers cx1 with AMF by invoking saAmfComponentRegister(). cp1 returns SA_AIS_OK to AMF for step 4. by invoking saAmfResponse(). AMF invokes SaAmfProxiedComponentInstantiateCallbackT for cx2 on cp1.
10
15
5. 6. 7.
20
cp1 registers cx2 with AMF by invoking saAmfComponentRegister(). 9. cp1 returns SA_AIS_OK to AMF for step 7. by invoking saAmfResponse(). 10. AMF invokes SaAmfProxiedComponentInstantiateCallbackT for cx3 on cp1.
8.
25
is not registered with AMF by cp1 for some reason (for instance, failure). 12. cp1 returns failure for step 10. (see step 15. for subsequent AMF actions). 13. AMF assigns CSIx1 to cx1 by invoking SaAmfCSISetCallbackT of cp1.
11. cx3 14. AMF 15. AMF
30
invokes SaAmfProxiedComponentCleanupCallbackT for cx3 on cp1 and carries out the regular procedure to try to revive cx3. If it fails, AMF transitions cx3 to the instantiation-failed presence state, raises the alarm, and so on. Note: In the scenario described above, CSIx3 was never assigned, which would also have been the case for an SA-aware component.
35
40
388
AIS Specification
Index of Definitions
Numerics 1_active component capability 85 1_active_or_1_standby component capability 85 1_active_or_y_standby component capability 85 2N redundancy model see also redundancy models definition 94 auto-adjust option 95 ordered list of service units for a service group 94 preferred number of in-service service units 94 A active assignment of a component 63 active assignment of a service unit 54 active assignment of/for a component service instance 63 active assignment of/for a service instance 54 active HA state of a component for a component service instance 62 active HA state of a service unit for a service instance 53 active-active redundancy configuration 132 administrative state of a cluster 74 administrative state of a node 71 administrative state of a service group 70 administrative state of a service instance 68 administrative state of an application 73 AM_START command 183 AM_STOP command 183 AMF cluster 30 AMF node 29 application type 45 applications definition 44 administrative state definition 73 locked 73 locked-instantiation 73 shutting-down 73 unlocked 73 type 45 assigned service units 89 assignment state of a service instance 69 associated contained component 35 associated container component 34 auto-adjust option 89, 95, 108, 124, 135, 148 auto-adjust probation period 92 automatic repair 167 Availability Management Framework cluster 30 Availability Management Framework node 29 C CLC-CLI definition 175 arguments 178 commands AM_START 183 AM_STOP 183 CLEANUP 182
INSTANTIATE 179 TERMINATE 181 environment variables 177 exit status 178 pathname of a command 176 pathname prefix 176 per-command pathname 176 CLC-CLI arguments 178 CLC-CLI environment variables 177 CLEANUP command 182 CLM cluster 30 CLM node 29 cluster administrative state definition 74 locked 74 locked-instantiation 74 shutting-down 74 unlocked 74 reset 31 start 31 Cluster Membership cluster 30 Cluster Membership node 29 cluster reset 31 cluster start 31 collocated contained components 35 component capability model 85 1_active 85 1_active_or_y_standby 85 1_active_or_1_standby 85 non-pre-instantiable 85 x_active 85 x_active_and_y_standby 85 x_active_or_y_standby 85 component capability model 85 component category 32 component healthchecks see healthchecks component life cycle see CLC-CLI component monitoring definition 160 external active monitoring 160 internal active monitoring 160 passive monitoring 160 component or service unit fail-over recovery 164 component service instance definition 40 container CSI 40, 192 name/value pairs 40 Proxy CSI 40 component service type 41 component type 39 component-invoked healthchecks 200 components see also healthchecks definition 32 active assignment 63 active assignment of/for a component service instance 63 associated contained 35
10
15
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
389
associated container 34 category 32 collocated contained 35 contained 34 container 34 external 32 HA state for a component service instance definition 61 active 62 quiesced 62 quiescing 62 standby 62 local 32 non-pre-instantiable 39 non-SA-aware 36 operational state definition 59 disabled 59 enabled 59 pre-instantiable 39 presence state definition 55 instantiated 56, 57 instantiating 56 instantiation-failed 56 restarting 57 terminating 56 termination-failed 56 uninstantiated 56 proxied 36 proxy 36, 37 readiness state definition 60 in-service 60 out-of-service 60 stopping 61 SA-aware 33 standby assignment 63 type 39 workload 40 composite administrative operations 317 composite operations see composite administrative operations contained components 34 container components 34 container CSI 40, 192 CSI see component service instance D dependencies dependencies amongst components 157 dependency amongst component service instances 156 SI-SI dependencies 155 tolerance time of an SI-SI dependency 156 dependencies amongst components 157 dependency amongst component service instances 156 disabled operational state of a component 59 disabled operational state of a node 72 disabled operational state of a service unit 50
E enabled operational state of a component 59 enabled operational state of a node 72 enabled operational state of a service unit 50 error detection 161 escalation of level 3 174 escalations of levels 1 and 2 172 exit status 178 external active monitoring 160 external components 32 external resources 26 external service units 41 F fail-over node fail-over recovery 166, 168 service unit fail-over recovery 168 fail-over recovery 164 fail-over scenario 76 failover see fail-over framework-invoked healthchecks 200 fully-assigned service instance 69 H HA state of a component for a component service instance 61 HA state of a service unit for a service instance 53 healthcheck key 200 healthcheck maximum-duration 202 healthcheck period 201, 202, 203 healthchecks see also components definition 200 component-invoked 200 framework-invoked 200 key 200 maximum duration 202 period 201, 202, 203 variants 200 I in-service readiness state of a component 60 in-service readiness state of a service unit 52 in-service service unit 52 in-service service units 88 instantiable service units 88 INSTANTIATE command 179 instantiated presence state of a component 56, 57 instantiated presence state of a service unit 48 instantiated service units 88 instantiated spare service units 89 instantiating presence state of a component 56 instantiating presence state of a service unit 48 instantiation level 157 instantiation-failed presence state of a component 56 instantiation-failed presence state of a service unit 48 internal active monitoring 160 L local components 32
10
15
20
25
30
35
40
390
SAI-AIS-AMF-B.03.01
AIS Specification
local resources 26 local service units 41 locked administrative state of a cluster 74 locked administrative state of a node 71 locked administrative state of a service group 70 locked administrative state of a service instance 68 locked administrative state of a service unit 49 locked administrative state of an application 73 locked-instantiation administrative state of a cluster 74 locked-instantiation administrative state of a node 71 locked-instantiation administrative state of a service group 70 locked-instantiation administrative state of a service unit 49 locked-instantiation administrative state of an application 73 logical entities 28 M maximum number of active SIs per service unit 107, 123, 134 maximum number of standby SIs per service unit 108, 124 monitoring see component monitoring multiple (ranked) standby assignments 90 N N+M redundancy model see also redundancy models definition 104 auto-adjust option 108 maximum number of active SIs per service unit 107 maximum number of standby SIs per service unit 108 ordered list of service units for a service group 106 ordered list of SIs 106 preferred number of active service units 107 preferred number of in-service service units 107 preferred number of standby service units 107 name/value pairs 40 no spare HA state 89 node failfast recovery 166, 168 node fail-over recovery 166, 168 node group configuration attribute 91 node switch-over recovery 165, 168 nodes administrative state definition 71 locked 71 locked-instantiation 71 shutting-down 71 unlocked 71 AMF 29 CLM 29 failfast recovery 166 fail-over recovery 168 node group configuration attribute 91 operational state definition 72 disabled 72 enabled 72 physical 26 switch-over recovery 165, 168 non-instantiated spare service units 89
non-pre-instantiable component capability 85 non-pre-instantiable components 39 non-pre-instantiable service unit 51, 52 non-pre-instantiable service units 42 non-SA-aware 36 no-redundancy redundancy model see also redundancy models definition 146 auto-adjust option 148 ordered list of service units for a service group 147 ordered list of SIs 147 preferred number of in-service service units 147 N-way active redundancy model see also redundancy models definition 132 active-active redundancy configuration 132 auto-adjust option 135 maximum number of active SIs per service unit 134 ordered list of service units for a service group 133 ordered list of SIs 133 preferred number of active assignments per SI 134 preferred number of assigned service units 134 preferred number of in-service service units 134 ranked service unit list per SI 134 N-way redundancy model see also redundancy models definition 121 auto-adjust option 124 maximum number of active SIs per service unit 123 maximum number of standby SIs per service unit 124 ordered list of service units for a service group 122 ordered list of SIs 122 preferred number of assigned service units 123 preferred number of in-service service units 123 preferred number of standby assignments per SI 123 ranked service unit list per SI 123 O operational state of a component 59 operational state of a node 72 operational state of a service unit 50 ordered list of service units for a service group 89, 94, 106, 122, 133, 147 ordered list of SIs 90, 106, 122, 133, 147 out-of-service readiness state of a component 60 out-of-service readiness state of a service unit 51 out-of-service service unit 51 P partially-assigned service instance 69 passive monitoring 160 pathname of a CLC-CLI command 176 pathname prefix 176 per-command pathname 176 physical node 26 preferred number of active assignments per SI 134 preferred number of active service units 107 preferred number of assigned service units 123, 134 preferred number of in-service service units 94, 107, 123, 134, 147
10
15
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
391
preferred number of standby assignments per SI 123 preferred number of standby service units 107 pre-instantiable components 39 pre-instantiable service unit 51, 52 pre-instantiable service units 42 presence state of a component 55 presence state of a service unit 48 primitive administrative operations 317 primitive operations see primitive administrative operations process registered 33, 199 unregistered 199 protection group 45 proxied components 36 proxy component failure handling 187 proxy components 36, 37 proxy CSI 40 Q quiesced HA state of a component for a component service instance 62 quiesced HA state of a service unit for a service instance 54 quiescing HA state of a component for a component service instance 62 quiescing HA state of a service unit for a service instance 53 R rank 88 ranked list 88 ranked service unit list per SI 123, 134 ranking 88 readiness state of a component 60 readiness state of a service unit 51 recommended recovery actions 170 recovery 168 definition 162 escalation 170 fail-over definition 164 component or service unit 164 node failfast 166, 168 node fail-over 166, 168 node switch-over 165, 168 service unit 168 restart definition 163 restart all components of the service unit 163 restart the associated container component and collocated contained components 163 restart the erroneous component 163 recovery escalation 170 reduction procedure 89 redundancy level of a service instance 90 redundancy models see also service groups, 2N redundancy model, N+M redundancy model, N-way redundancy model, N-way active redundancy model, no-redundancy redundancy model definition 44 common definitions
auto-adjust option 89 auto-adjust probation period 92 in-service service units 88 instantiable service units 88 instantiated service units 88, 89 instantiated spare service units 89 multiple (ranked) standby assignments 90 no spare HA state 89 non-instantiated spare service units 89 ordered list of service units for a service group 89 ordered list of SIs 90 reduction procedure 89 redundancy level of a service instance 90 node group configuration attribute 91 registered process 33, 199 repair definition 167 automatic repair 167 node failfast recovery 168 node fail-over recovery 168 node switch-over 168 service unit failover recovery 168 resources 26 external 26 local 26 restart 161 restart all components of the service unit 163 restart recovery 163 restart the associated container component and collocated contained components 163 restart the erroneous component 163 restarting presence state of a component 57 restarting presence state of a service unit 49 RM see redundancy models S SA-aware component 33 service component service instance 40 type 43 service group redundancy model 87 service group type 44 service groups see also redundancy models definition 44 administrative state definition 70 locked 70 shutting-down 70 unlocked 70 type 44 service instance 42 administrative state definition 68 fully-assigned 69 locked 68 partially-assigned 69 shutting-down 68 unassigned 69 unlocked 68
10
15
20
25
30
35
40
392
SAI-AIS-AMF-B.03.01
AIS Specification
assignment state definition 69 service type 43 service unit failover recovery 168 service unit type 42 service units definition 41 active assignment 54 active assignment of/for a service instance 54 administrative state locked 49 locked-instantiation 49 shutting-down 49 unlocked 49 assigned 89 external 41 HA state for a service instance definition 53 active 53 quiesced 54 quiescing 53 standby 53 in-service 88 instantiable 88 instantiated 88 instantiated spare 89 local 41 non-instantiated spare 89 non-pre-instantiable 42, 51 operational state definition 50 disabled 50 enabled 50 ordered list for a service group 89 pre-instantiable 42, 51 presence state definition 48 instantiated 48 instantiating 48 instantiation-failed 48 restarting 49 terminating 49 termination-failed 49 uninstantiated 48 readiness state definition 51 in-service 52 out-of-service 51 stopping 52 standby assignment 54 type 42 SG see service groups shutting-down administrative state of a cluster 74 shutting-down administrative state of a node 71 shutting-down administrative state of a service group 70 shutting-down administrative state of a service instance 68 shutting-down administrative state of a service unit 49 shutting-down administrative state of an application 73 SI see service instance
SI-SI dependencies 155 spare instantiated spare service units 89 non-instantiated spare service units 89 standby assignment 54, 63 standby HA state of a component for a component service instance 62 standby HA state of a service unit for a service instance 53 startup 31 stopping readiness state of a component 61 stopping readiness state of a service unit 52 SU see service unit switch-over scenario 76 T TERMINATE command 181 terminating presence state of a component 56 terminating presence state of a service unit 49 termination-failed presence state of a component 56 termination-failed presence state of a service unit 49 tolerance time of an SI-SI dependency 156 U unassigned service instance 69 uninstantiated presence state of a component 56 uninstantiated presence state of a service unit 48 unlocked administrative state of a cluster 74 unlocked administrative state of a node 71 unlocked administrative state of a service group 70 unlocked administrative state of a service instance 68 unlocked administrative state of a service unit 49 unlocked administrative state of an application 73 unregistered process 199 V variants of healthcheck 200 W workload 40 wrapper method 159 X x_active component capability 85 x_active_and_y_standby component capability 85 x_active_or_y_standby component capability 85
10
15
20
25
30
35
40
AIS Specification
SAI-AIS-AMF-B.03.01
393
10
15
20
25
30
35
40
394
SAI-AIS-AMF-B.03.01
AIS Specification