FNAL Capacity Management Policy Process Procedures
FNAL Capacity Management Policy Process Procedures
FNAL Capacity Management Policy Process Procedures
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
GENERAL
Description This document establishes a Capacity Management process and procedures for Fermilab. The purpose of this process is to establish a Capacity Management process for the Fermilab Computing Division. Adoption and implementation of this process provides a structured method to ensure that the required capacity exists within the IT environment so that IT Services meet business requirements as documented in Service Level Agreements, and that this is provided in a cost-effective and timely manner. Note: The Capacity process can be triggered by many other processes. In the normal course of business, each service has a pre-determined capacity review cycle (usually annually, to coincide with the budget cycle), and the process executes according to that cycle. <Enter Applicable Details> N/A Ray Pasetes MM-DD-2010 Capacity Manager
Purpose
VERSION HISTORY
Version
0.1 0.2 0.3 0.4 0.5 0.6
Date 12/28/2009
01/05/2010 01/26/2010 02/09/2010 02/17/2010 03/04/2010
Author(s)
David Cole - Plexent David Cole - Plexent David Cole - Plexent David Cole - Plexent David Cole - Plexent David Cole - Plexent
Change Summary
Initial Draft Version Added newly discovered information Incorporated feedback from Workshop Updates as a result of Core Team Review Updates as a result of General CD Review Further updates as a result of General CD Review
Page 1 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
TABLE OF CONTENTS
General ................................................................................................................................. 1 Introduction ........................................................................................................................... 3 General Introduction ........................................................................................................ 3 Document Organization ................................................................................................... 3 Capacity Management Policies ............................................................................................. 4 Capacity Management Process Flow .................................................................................... 5 Capacity Management General Notes ............................................................................. 6 Capacity Management Process Roles & Responsibilities ................................................ 6 Capacity Management Process Measurements............................................................... 7 Capacity Management Critical Success Factors .............................................................. 8 Capacity Management Process Relationships ................................................................. 9 Manage Business Capacity Requirements Procedure Flow ................................................ 12 Manage Business Capacity Requirements Procedures Rules ....................................... 13 Manage Business Capacity Requirements Procedure Narrative .................................... 13 Verification .................................................................................................................... 15 Management Review Criteria ........................................................................................ 15 Escalation Criteria ......................................................................................................... 15 Risks ............................................................................................................................. 15 Manage Service Capacity Requirements Procedure Flow ................................................... 16 Manage Service Capacity Requirements Procedure Rules ........................................... 17 Manage Service Capacity Requirements Procedure Narrative ...................................... 17 Verification .................................................................................................................... 18 Review Criteria .............................................................................................................. 18 Escalation Criteria ......................................................................................................... 19 Risks ............................................................................................................................. 19 Manage Resource Capacity Requirements Procedure Flow: .............................................. 20 Manage Resource Capacity Management Procedures Rules ........................................ 21 Manage Resource Capacity Requirements Procedure Narrative ................................... 21 Verification .................................................................................................................... 23 Review Criteria .............................................................................................................. 23 Escalation Criteria ......................................................................................................... 23 Risks ............................................................................................................................. 23 Create & Distribute Capacity Reports Procedure Flow ........................................................ 24 Create & Distribute Capacity Reports Procedure Rules ................................................. 25 Create & Distribute Capacity Reports Procedure Narrative ........................................... 25 Verification .................................................................................................................... 26 Review Criteria .............................................................................................................. 26 Escalation Criteria ......................................................................................................... 26 Risks ............................................................................................................................. 26 Appendix 1: Relationship to Other Documents .................................................................... 27 Appendix 2: Capacity RACI Chart ....................................................................................... 28 Appendix 3: Phase 1 Capacity Management Scope ............................................................ 30 Appendix 4: Communication Plan ....................................................................................... 31 Appendix5: Forms, Templates............................................................................................. 34
Page 2 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
INTRODUCTION
GENERAL INTRODUCTION
Capacity management ensures that the information technology processing and storage capacity is adequate to the evolving requirements of the organization as a whole in a timely and cost justifiable manner. The benefits of an effective and efficient Capacity Management Process include: Assurance that IT resources are planned and scheduled to match the current and future needs of the business Provision of a Capacity Plan that outlines the IT resources and funding (and cost justification) needed to support the business Reduction in Capacity-related Incidents through pre-empting performance issues Implementation of corrective actions for capacity-related events Methods for the tuning and optimizing of the performance of IT Services and Configuration Items A structure for planning upgrades and enhancements and estimating future requirements by trend analysis of current Configuration Item utilization and modeling changes in IT Services Assurance that upgrades are planned, budgeted, and implemented before SLAs (in terms of availability or performance) are breached Financial benefits through avoidance of 'panic' buying.
DOCUMENT ORGANIZATION
This document is organized as follows: Introduction Capacity Management Policies Capacity Management Process Flow Process Measurements Process Roles and Responsibilities Process Critical Success Factors Capacity Management Process Integration Points 1.0 Manage Business Capacity Requirements Procedure Manage Business Capacity Requirements Procedure Rules Manage Business Capacity Requirements Procedure Narrative Verification Management Review Criteria Escalation Criteria Risks 2.0 Manage Service Capacity Requirements Procedure Manage Service Capacity Requirements Procedure Rules Manage Service Capacity Requirements Procedure Narrative Verification Management Review Criteria Escalation Criteria Risks 3.0 -Manage Resource Capacity Requirements Procedure Manage Resource Capacity Requirements Procedure Rules Manage Resource Capacity Requirements Procedure Narrative Verification Management Review Criteria
Page 3 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6 Escalation Criteria Risks 4.0 Create & Distribute Capacity Reports Procedure Create & Distribute Capacity Reports Procedure Rules Create & Distribute Capacity Reports Procedure Narrative Verification Management Review Criteria Escalation Criteria Risks Appendix 1: Relationship to Other Documents Appendix 2: Raci Matrix Appendix 3: Phase 1 Capacity Management Scope Appendix 4: Communications Plan Appendix 5: Forms, Templates
Page 4 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
End
Page 5 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Capacity Manager
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Capacity Analyst
The Capacity Analyst performs or directs many of the day-to-day and strategic capacity activities on behalf of the Capacity Manager.
Whereas the Capacity Manager is accountable for most capacity-related activities, the Capacity Analyst is responsible for the gathering and analyzing of data for a specific service support area (e.g. Network), and then forwarding the information to the Capacity Manager, who will provide the holistic view for an entire service.
Possesses a comprehensive knowledge of the service delivery infrastructure and the capacity impacts of those infrastructure components on the service as a whole. When analysis is required, initiates the requests to the appropriate infrastructure teams, receives and analyzes the results, and creates the various reports. Reviews all Capacity reports with the Capacity Manager and publishes them after approval.
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Page 8 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Page 9 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Information
Considerations of and agreements to required service levels Contribution to the design of services and cost: value ratios Identification of the need and proposals for demand management Identification of throughput and peaks and troughs for SLAs Provision of capacity metrics and advice for service reviews Help with identifying response times and batch turnaround times for SLAs Measurement of transaction response, batch elapsed times, etc. Capacity predictions and plan Work together very closely in designing resiliency (shares tools) Capacity plan details how Availability issues have been dealt with Provide capacity information to help assess component risk Details of new technology being considered Information to review if unavailability linked to capacity Resource Capacity Management information Share tools with Capacity Management Share use of techniques such as CFIA and FTA Completed CFIA Extent of infrastructure needed in emergency minimum to support required performance and throughput Update ITSCM plan as services change Participate in tests and monitor performance to ensure SLRs can be met Ensures that ITSCM requirements are included in the Capacity Plan Inform BIA process by providing details of workload profiles Assist in evaluation of potential recovery options Review RFCs for their impact on the ITSCM Plan Design information to feed Capacity Plan Business / ITSCM strategy for inclusion in Capacity plan Details to support charging or a new requirement Cost allocation mechanics based on resource usage Details of predicted workload and users Half of the capacity vs. cost equation Predictions of upgrades for budgeting and planning Usage and performance data All resources used in the Capacity Management process Assistance in costing capacity options Cost of Capacity Management Budgeting provides the money to feed the Capacity Plan
Availability Management
to
Capacity Management
to
Capacity Management
Capacity Management
to
to
Capacity Management
Page 10 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Information
Influence customer demand for stretched resources (helping make better use of existing resources) Financial Plans and Budgets within which Capacity Management must operate Current budget and cost effectiveness
Updates for CIs stored in the CMDB Current infrastructure topology to identify what requires Capacity Management Identification of potential performance bottlenecks and other points of weakness Knowledge of the relations between service elements could aid in problem management Resource information, stored in the CMDB is used by Capacity Management Capacity Plan stored in CMDB Assessment of RFCs and their impact on Capacity RFCs for improved process RFCs for improved Capacity RFC for Capacity Management Assessment FSC Backed out Changes Reports on implemented changes for their impact on Capacity Identification of Capacity issues during release planning Immediate identification of Capacity issue during a Release Release Plans Advice on Workarounds Incidents related to Capacity failures or potential failures Highlight potential Capacity Problems Resource provision on Problem Management teams Problem data relating to capacity issues Leadership when there are complex problems involving Capacity issues
Configuration Management
to
Capacity Management
Capacity Management
to
Change Management
Change Management
to
Capacity Management
to to to to
to to
Page 11 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Yes
No
Return
Page 12 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
New Suggested Service OR Major change, to existing service The purpose of this procedure is to analyze, forecast and document the organizations (future) demand for IT capacity.
Receive the RFC for the new or changed service. Review the details of the RFC. Determine the impacts to the business of implementing this RFC. Record the determined impacts. Proceed to 1.2 Review SLAs. Review all SLAs for the service which will be impacted. Determine the impacts on the current SLAs of implementing the new or changed service. Record the results of the analysis. Proceed to 1.3 Decision Changes Required? Will changes be required to the infrastructure be required in order to deliver the new or changed service as well as to maintain the current SLAs? If yes, proceed to Service Level Management, which will Negotiate, Obtain Agreement, and Sign the amended SLA. If no, proceed to Service Level Management, which will define and obtain agreement on Service Level Requirements.
Capacity Manager
Page 13 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
In consultation with the teams impacted by this new or changed service, develop Service Level Requirements. Agree and document the SLRs. Interface with the Configuration Management processes to design, amend or procure items for the new configuration. After that, invoke the Change Management processes to deploy the change. Finally, interface with the Configuration Management processes to update the CMDB (Configuration Management Database) and the CDB (Capacity Database). Note: The Service Level Manager will be kept informed of the fact that new SLRs have been defined and agreed upon, since the delivery of those requirements will have a direct impact on the services for which there are SLAs. Exit this sub-process and return to the calling process (0.0)
Return
Page 14 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Exit Criteria
Outputs
Completion and validation that the change has executed according the Change Management Process. CMDB and CDB updated as appropriate Closed RFC Updated CMDB Updated CDB.
VERIFICATION
Result
Closed RFC Updated CMDB Proceed Proceed Proceed
Action
Updated CDB.
ESCALATION CRITERIA
Event
Dispute between Capacity Management and Change Management on the criteria for success of the change.
Action
Escalation through normal management chain
Notification
The Change Manager is the ultimate decision maker.
RISKS
Risk
No approved RFC Databases not updated
Impact
Reduced chance that the change will be successfully applied Risk that there will continue to be service delivery issues because the changes have not been recorded, rendering impact analysis less than effective.
Page 15 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Service Level Management Receive Information on Service Breaches & Near Misses
Return
Page 16 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
The Capacity Event has been recorded in the Remedy Incident system, OR Notification that the Business Capacity Requirements sub-process has completed and been validated. The purpose of this sub-process is to monitor, guard, analyze and report on the performance of the IT capacity regarding the organizations demand.
Responsible Role
Capacity Manager, Service Level Manager
Action
Determine service levels based on: SLAs The Capacity Plan Other Critical Factors e.g. input from Risk Assessments, Component Failure Impact Analysis, Service Outage Analysis etc. Conduct activities to ensure that the required service levels are maintained. Proceed to 2.2 - Monitor, Evaluate, & Report. Ensure that regular monitoring is being performed on those components that have been identified as critical to the provision of services as defined in SLAs. Perform regular evaluation of the capacity from the perspective of its general health for service provision. Produce regular reports on the findings, and distribute the reports to the Service Level Manager and to the managers of the various infrastructure components. Proceed to 2.3 - Identify Trends.
2.2 Monitor, Evaluate & Report Capacity Analyst 2.3 Identify Trends Capacity Manager
Identify any trends which emerge as a result of the regular monitoring and evaluation of the components involved in the delivery of the services. Document the trends so that they can be used in future analyses. Proceed to 2.4 Establish Normal Operation Levels.
When there is sufficient data, establish the normal operation levels for the components involved in the delivery of the services. Document those normal operation levels and obtain agreement from the appropriate managers. Proceed to 2.5 Define Exception Levels.
Page 17 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Responsible Role
Capacity Manager
Action
Establish tolerance levels for each of the components, and obtain agreement from the appropriate managers. Document the agreed-upon tolerances so that appropriate responses can also be defined. Proceed to 2.6 Report Service Breaches and Near Misses.
Note: The tolerances should be such as to allow ample time to address issues prior to a service breach. For example, a disc being 85% full could be an exception level. 2.6 Report Service Breaches & Near Misses Return Capacity Manager Have a standard format for reporting service breaches as well as situations where the agreed-upon tolerances have been approached. As required, prepare this report for the Service Level Manager. Proceed to Return. Exit the Manage Service Capacity Requirements sub-process and return to the calling process.
Exit Criteria
Outputs
Normal operation levels as well as tolerances for each identified component have been identified and agreed-upon. Service Breaches and near misses have been identified and reported to the Service Level Manager. Normal performance level definitions, Exception level definitions, Performance Data, Service Reports
VERIFICATION
Result
Monitoring completed for defined timescale Trending is being performed on a regular basis Normal component performance levels have been identified and documented for identified components Exception levels have been identified and documented for identified components Proceed Proceed. Proceed Proceed
Action
REVIEW CRITERIA
Result
Monitoring is not generating data to the level needed e.g. to pinpoint to cause of a specific Capacity-related event
Action
Management decides whether more detailed monitoring is required by balancing the need for the data against the potential impact that the generation of large amounts of data may have on the IT environment
Page 18 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
ESCALATION CRITERIA
Event
Service Level Agreements have been breached
Action
Notify Service Level Management
Notification
Service Level Manager
RISKS
Risk
N/A
Impact
Page 19 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Page 20 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Sub-Processes 1.0 and 2.0 have been completed. The purpose of the Manage Resource Capacity Requirements is to monitor, guard, analyze and tune the performance of the various components of the IT infrastructure.
Ensure that monitoring is functioning as intended for each of the components on which it is installed and activated. Proceed to 3.2 Collect Data.
Capacity Analyst
Collect the data for the components on which monitoring is installed and activated. Organize and collate the gathered data so as to allow for analysis. Pass this data to the Service Level Management Process, which will perform audits and reviews on the components from the perspective of their current and future capabilities to deliver the service within the parameters agreed-upon by the SLAs. After the results of the audits or reviews have been returned from Service Level Management, proceed to 3.3 Perform Preemptive and Reactive Problem Determination. Review the results of the monitoring or the Reviews/Audits, as well as the details of any Capacity Event if appropriate. Determine the probable cause of any actual or potential capacity problems. Identify potential solutions to the problems. Record the details of this activity Proceed to 3.4 - Determine the Effects of Change. Decide which techniques are appropriate for determining the effects of a proposed change. As appropriate, perform trending, or modeling. Determine training requirements for the proposed change. Document the findings. Proceed to 3.5 Plan & Budget HW & SW Upgrades & HR augmentation.
Capacity Analyst
Capacity Analyst
Page 21 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Capacity Analyst
3.8 Finalize & Agree on the Capacity Plan Capacity Manager Return
The Capacity Plan is completed and agreed upon by all appropriate parties.
Completed and agreed-upon Capacity Plan, Hardware, software and personnel evaluations where appropriate.
Page 22 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
VERIFICATION
Result Capacity Plan completed and agreed-upon Proceed Action
REVIEW CRITERIA
Result
The Capacity Plan is not at the level required to adequately manage service agree-upon service delivery levels
Action
Management decides what should be included in the plan to provide adequate capacity control, and initiates a change to incorporate those items.
ESCALATION CRITERIA
Event
Agreement cannot be reached on the Capacity Plan.
Action
Notify Service Level Management
Notification
Service Level Manager
RISKS
Risk
No agreed-upon Capacity Plan
Page 23 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Return
Page 24 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Capacity Manager
For regularly scheduled reports ensure that the requirements are still valid. For ad hoc reports define the reporting requirements. Proceed to 4.2 Define or Validate Audience. For regularly scheduled reports ensure that the identified audience is still valid. For ad hoc reports define the appropriate audience for the report. Proceed to 4.3 Identify Data Sources.
Capacity Manager
Determine the sources for the data which will be used in the report. This will probably require both the name of the data store, as well as the
appropriate fields within that store.
Page 25 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
VERIFICATION
Result Requirements defined or validated. Audience defined or validated. Data sources identified. Report produced Proceed Proceed Proceed Proceed Action
REVIEW CRITERIA
Result
The Capacity Plan is not at the level required to adequately manage service agree-upon service delivery levels
Action
Management decides what should be included in the plan to provide adequate capacity control, and initiates a change to incorporate those items.
ESCALATION CRITERIA
Event
Agreement cannot be reached on the Capacity Plan.
Action
Notify Service Level Management
Notification
Service Level Manager
RISKS
Risk
No agreed-upon Capacity Plan
Page 26 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Relationship
Requirements Process, Procedure Process, Procedure Template
Page 27 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
R - Responsible A - Accountable C - Consult I - Inform Role responsible for getting the work done Only one role can be accountable for each activity The role who are consulted and whose opinions are sought The role who are kept up-to-date on progress
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6 4.1 Define or Validate Requirements 4.2 Define or Validate Audience 4.3 Identify Data Sources 4.4 Gather & Analyze Data 4.5 Produce Report 4.6 Distribute Report
A A A A A A C C R R R R C C I I I I I I I I I I I I I I I I I I I I I I I I I I I I
Page 29 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Bandwidth Mb/s Gb/s Storage Capacity (GB, TB) Bandwidth (IOPS) Database & Infrastructure Services Transaction Rates Peak Transactions / Second Mean time to complete Tape Mounts / Hour Occupied Slots vs. Slots Available Initial Service Focus Account/Password Services Network Services Print Services
Page 30 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Approach:
This plan details tasks that apply generally to all ITIL processes. The plan assumes that there will be a combination of face-to-face training/meeting events and broadcast communications designed to both increase awareness of the processes among stakeholders and to ensure high performance of the new processes among key service delivery staff.
# 1
Capacity Manager
Capacity Manager
Capacity Analysts
Monthly
Capacity Manager
Page 31 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6 Each type of communication has a specific focus, however a common approach can be taken to define and formulate the specific communication activities. The steps listed below formulate the approach to be taken to compose those activities:
Activities
Step 1 Formulation Formulate goals and objectives of communication Formulate core message Identify all parties involved Integrate with existing communications forums Step 2 Analysis Determine available and acceptable communication media Determine communication culture and define acceptable approach Determine existing knowledge of subject in the environment Step 3 Identification Determine key interest groups related to the subject of the campaign Determine communication objectives per interest group Determine the key messages from each interest groups perspective Step 4 Definition Select the most appropriate media for communication from: Direct Media such as workshops, Focus Group discussions, or individual presentations Indirect Media such as the Intranet, lectures or newsletters Step 5 Planning Define a plan that links important points in the subject of the communication (e.g. milestones in a project) to communication activities, and media Determine the communication audience and resources Determine the review criteria for successful communication Obtain formal management support for the plan Step 6 Implementation Using the FNAL Communications Process, perform communication activities as per plan Manage the plan and safeguard it Ensure production and distribution of materials is effective and as per plan Continually gauge reaction to the approach and messages Step 7 Evaluation Monitor reactions to the communication approach throughout the delivery of the plan and adjust the plan if necessary Determine during the effects of the campaign using the review criteria established in step 5 The following types of communication are available:
Examples of Usage
To initiate (or trigger) actions To gain required resources (people, information, budget etc.) To communicate operational process information To promote team awareness To communicate process descriptions/instructions To communicate reports
Each of the above types of communication can be delivered via one or more of the following mediums: Communication Medium
Email
Examples of Usage
Individual email messages
Communication Type
Notification Escalations Reports
Page 32 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6 Verbal Formal and informal meetings Presentations Telephone calls Updated process documents Issued Project documentation Implementation and back-out plans Monthly statistics Development progress Escalation Status changes Service breaches and near misses Notifications Escalations Controlled Documents
Documentation
Page 33 of 36
FNAL_Capacity_ Management_Policy_Process_Procedures_Draft_V0_6
Page 34 of 36