Academia.eduAcademia.edu

Bridging Reliability Engineering and Systems Engineering

2016

The increasing application of sensors, actuators, and complex algorithms for delivering artificial intelligence and connectivity in products and product-systems will drive an unprecedented growth in design complexity and software content, making it increasingly more difficult to ensure dependability in an economical manner. Much learning about the dependability of such new and innovative products is likely to happen as they are conceived and designed. Consequently, accelerated verification and validation iterations supported by easy and rapid storage and retrieval of failure knowledge must be enabled. No single software solutions provider effectively covers all three critical areas required for developing and delivering dependable smart connected products, namely, reliability engineering, systems engineering, and failure knowledge management. This paper mainly presents a potential map of the commonly used reliability engineering tools overlaid on the systems engineering technical processes. The paper recommends including a formal knowledge storage and retrieval system in the closed-loop between systems engineering and reliability engineering so that the details observed in past failures are not missed in future design iterations.

2016 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM SYSTEMS ENGINEERING (SE) TECHNICAL SESSION AUGUST 2-4, 2016 – NOVI, MICHIGAN BRIDGING RELIABILITY ENGINEERING AND SYSTEMS ENGINEERING Venkatesh Agaram Quality & Reliability Engineering Practice CIMdata Inc, Ann Arbor, MI ABSTRACT The increasing application of sensors, actuators, and complex algorithms for delivering artificial intelligence and connectivity in products and product-systems will drive an unprecedented growth in design complexity and software content, making it increasingly more difficult to ensure dependability in an economical manner. Much learning about the dependability of such new and innovative products is likely to happen as they are conceived and designed. Consequently, accelerated verification and validation iterations supported by easy and rapid storage and retrieval of failure knowledge must be enabled. No single software solutions provider effectively covers all three critical areas required for developing and delivering dependable smart connected products, namely, reliability engineering, systems engineering, and failure knowledge management. This paper mainly presents a potential map of the commonly used reliability engineering tools overlaid on the systems engineering technical processes. The paper recommends including a formal knowledge storage and retrieval system in the closed-loop between systems engineering and reliability engineering so that the details observed in past failures are not missed in future design iterations. INTRODUCTION devices industry have gone up from 14% in 2005 to 25% in Connectivity and artificial intelligence are major features 2011. This percentage has been trending upwards since of many upcoming products that are increasingly likely to be 1983, with software-related recalls as a percentage of overall systems-of-systems. A large set of complex algorithms is recalls averaging 6% between 1983 and 1991, 8% between needed for estimating accurately the instantaneous states of 1992 and 1998, 11% between 1999 and 2004, and 19% these systems and their operational environments and for between 2005 and 2011. exercising robust control over the systems to deliver the The relationship between the number of recall events in a benefits desired by the end users. Intricate algorithms are period of time and the number of units impacted by the used for sensor data fusion, remote diagnostics, remote recalls can be vastly different between industries [2]. For repair, autonomous control, timely hand-off to humans, and example, 150 recall events in the medical device industry many other functions. The increasing application of sensors, may amount to 300,000 affected units but in the automotive actuators, and the associated algorithms in products and industry, 30 recall events could impact 2 million vehicles. product-systems will drive an unprecedented growth in A financial advisory blog [3] mentions that there has been design complexity and software content, making it a substantial increase in software-related recalls in the increasingly more difficult to ensure dependability in an automotive industry since 2012. The authors cite 32 economical manner. software-related recalls that affected 3.6 million light Even without connectivity and artificial intelligence, the vehicles between 2005 and 2012. However, they mention launch delays and recalls associated with today’s less 6.4 million additional vehicles affected by 63 additional sophisticated electronically controlled mechanical systems, software-related recalls between 2013 and 2015. The blog due to performance issues, can very often be traced to design also mentions going from 0.3% of recalls being software- complexity and software-content. In case of connected, related in 2005 to 4.3% of recalls being software-related automated or autonomous systems, the problem can be within the first 6 months of 2015, and this trend the authors expected to worsen. state, is showing no signs of reversing. The authors of the A US FDA study [1] has determined that software-related blog also report a similar trend seen in NHTSA’s complaint recalls measured as a percentage of all recalls in the medical data. Over the period covering 2005 to 2009, 55 software- Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) related complaints were logged with NHTSA, whereas, over applicable interoperability standards so that the enterprise the period 2010 to 2014, 197 complaints contained the same level system integrators can seamlessly connect the tools reference to software related-issues, highlighting the needed for reliability engineering, systems engineering, and increased role of software in automotive safety. knowledge management. Notable software-related issues in recent times, from the This paper mainly presents a potential map of the aerospace industry have been in connection with the Boeing commonly used reliability engineering tools overlaid on the 787 and the F-35 Joint Strike Fighter. A software bug in the systems engineering technical processes. The paper adheres Boeing 787 was found to be capable of shutting down the to the technical process of systems engineering described by plane’s electric generators every 248 days because a INCOSE [4] and the commonly known tools and processes software counter, internal to the generator control units used in design-for-six-sigma and reliability engineering [5, (GCUs) could overflow after 248 days of continuous power. 6]. The paper recommends including a formal knowledge This could cause the GCU to go into failsafe mode, resulting storage and retrieval system in the closed-loop between in a loss of all electrical power regardless of the flight phase. systems engineering and reliability engineering so that the The F-35 Joint Strike Fighter is expected to be further details observed in past failures are not missed in future behind in its combat-readiness due to issues with its design iterations. RADAR software and vulnerability to cyber-attacks, and these require the system to be rebooted every four hours of RELIABILITY ENGINEERING MEETS SYSTEMS flight time while the desired reboot interval of the F-35 is ENGINEERING eight to ten hours of flight time. To enable fast learning cycles that will help identify The failure modes of software-intensive, control systems potential failure modes of complex systems, a seamless driven products are difficult to guess a priori due to the enterprise level connection between the systems engineering complexity of their functions and information flow, and technical processes, the reliability engineering tools, and a consequently, crucial failure modes can easily be missed. knowledge management system is needed, so that efficient Much learning about the dependability of such new and storage and retrieval of failure modes information is innovative products is likely to happen as they are conceived possible. and designed. Consequently, accelerated verification and To accomplish the development of the enterprise level validation iterations supported by easy and rapid storage and connection mentioned above, the activities related to the retrieval of failure knowledge must be enabled. thirteen technical processes of systems engineering [4] have Today, enterprise level reliability engineering tools and been considered in this work as a higher-level product systems engineering are not well-connected, preventing lifecycle structure. Those thirteen systems engineering many lessons learned in reliability engineering from helping technical processes are: drive robust designs via systems engineering. This situation • Stakeholders’ Requirements Identification is further aggravated when we consider the silos of expertise • System Requirements Definition and data among mechanical, electrical, and software • System Architectural Design disciplines. The product development tools used in these • System Elements Definition disciplines’ silos are very different and most often not • System Analysis connected with each other, rendering the ability to carry out • System Elements Realization systems engineering very difficult. A major challenge arises • System Elements Integration due to improper channeling of prior experience and • System Design Verification knowledge about reliability into design, leading to repeated • Verified System Transition dependability issues of complex products. • System Performance Validation No single software solutions provider effectively covers all • System Operation three critical areas required for developing and delivering • System Maintenance dependable smart connected products, namely, reliability • System Disposal engineering, systems engineering, and knowledge In the following section the diverse tools used in design- management. Further, given the complexity of the for-six-sigma and reliability engineering that should support development of the large number of specialty tools required the systems engineering technical processes are briefly to do this, it is perhaps unrealistic to expect that a described. completely integrated suite of solutions will be available Beyond the next section, each activity associated with the from a single software solutions provider. thirteen technical processes of systems engineering are The most practical solution for developing dependable, associated with a set of design-for-six-sigma and reliability connected, and intelligent products could come through the engineering tools, wherever those tools are likely to have a application of specialty software tools that conform to beneficial impact. The purpose is to present a connection Bridging Reliability Engineering & Systems Engineering Page 2 of 6 Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) between the tools used in reliability engineering and design- lead to failure, to predict reliability and improve product for-six-sigma, and the main technical process activities of performance. systems engineering that need to be accomplished by 12. Robust Optimization—optimization of system enterprise level software systems integration so as to performance while increasing the stability of the system adequately deal with the complexity of designing and against product, process, and usage variability. delivering smart connected products. 13. Design of Experiments (DOE)—appropriate choice of discrete values of independent variables for finding the Reliability Engineering Tools corresponding values of the chosen dependent variables A wide variety of tools are used in reliability engineering to understand the system response. and design-for-six-sigma. These help prevent failures and 14. Response Surface Analysis—expressing the response of increase the operational life of systems, subsystems, and complex systems as a continuous function of chosen components. independent variables to cost-effectively generate 1. Affinity Diagrams (KJ Analysis)—for clustering similar insights about systems’ behavior. items to get a higher level view of a large number of 15. Monte Carlo Simulations—random sampling of entities being analyzed, e.g., VOC. independent variables and the generation of the 2. Quality Function Deployment (QFD) or House of corresponding independent variable values to Quality (HOQ)—meant to develop the interpretation understand the behavior of the dependent variables. matrix for creating functional requirements that will 16. Conjoint Analysis—ranking of stakeholders’ satisfy the stakeholder requirements—serves as a requirements based on combinations of attributes and traceability matrix. weights representing the relative importance of the 3. Kano Analysis—way of classifying requirements into attributes. “delighters,” “performance needs,” and “basic needs” 17. Kepner-Tregoe Analysis (KTA)—systematic way of for rank ordering the stakeholder requirements based on decision making based on wants and must haves, their importance. possible alternative solutions, probability of adverse 4. Functional Flow Block Diagram (FFBD)—multi-tier, effects, and significance. time-sequenced, step-by-step flow diagram of a 18. Analytic Hierarchy Process models (AHP)—modeling system’s functional flow. the problem as a hierarchy containing the decision goal, 5. SysML Diagrams—representation of different aspects the alternatives for reaching it, and the criteria for of systems through diagrams, namely, requirements, evaluating the alternatives. activity, block definition, internal block definition, 19. Multi-Attribute Utility Analysis (MAUA)—a scalar parametric representation, use cases, system state, and utility function over the domain of the attribute values sequence. to assess tradeoffs between different attributes. 6. Integrated Definition for Functional Modeling Diagrams 20. Fault Tree Analysis (FTA)—top down, deductive (IDEF0)—designed to model the decisions, actions, and failure analysis in which system failure is analyzed activities of a system. using Boolean logic to combine a set of lower-level 7. N2 Charts—matrix, representing functional or physical events. interfaces between system elements, also applicable to 21. Event Tree Analysis (ETA)—bottom up modeling hardware and/or software interfaces. technique for success and failure which explores 8. Specification Tree—specifications of a technical system responses through a single initiating event and helps under development in a hierarchical order going from assess overall probabilistic system response. system requirements, to system design specifications, to 22. Reliability Block Diagrams (RBD—for showing how subsystem specifications, to assembly specifications, components and subsystems contribute to the success or and to component specifications. failure of a complex system. 9. Failure Modes Effects & Criticality Analysis 23. Failure Reporting Analysis & Corrective Action System (FMECA)—bottom-up, inductive analysis performed at (FRACAS)—for reporting, classifying, and analyzing the functional or component level, enhanced by the failures, and planning corrective actions in response to relationship between the probabilities of occurrence of failures. failure modes against the severity of their consequences. 24. Corrective Action & Preventive Action (CAPA)— 10. TRIZ: Theory of Inventive Problem Solving—problem systematic investigation of the root causes of system solving, analysis and forecasting derived from the study problems to correct the situation or prevent it from of patterns of invention in the global patent literature. occurring. 11. Physics of Failure Analysis—modeling and simulation 25. Markov Analysis—breaking the final (or failed) system based understanding of processes and mechanisms that state into a number of intermediate states, connected Bridging Reliability Engineering & Systems Engineering Page 3 of 6 Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) with each other by transition matrices, under the instance, implicit knowledge about failure modes exists in assumption that each state is memory-less. natural language, and consequently, the meaning depends on 26. Weibull Analysis—makes predictions about the life of interpretations by the team which is reusing that knowledge. products by fitting statistical distributions to product life The problem of reusing pre-existing knowledge about data (performance over product lifetime). failure modes could be solved effectively through the 27. System Maintainability Analysis—determines the ease definition of an ontology, which enables a common and speed with which a system can be restored to understanding of the domain specific concepts without need operational status after a failure occurs. It is calculated for interpretation, while making the ontology-held based on time-to-repair as the random variable instead knowledge explicit and machine-readable. An ontology of time-to-failure for estimating system reliability. helps to integrate the elements of task-relevant knowledge 28. System Availability Analysis—probability of a unit by uniformly structuring the domain knowledge. being available in a fully functional state, calculated An ontology, which consists of definitions of concepts, based on mission duration and observed or simulated relationships, and rules, is used in knowledge-based systems mission downtime. where formalized knowledge is represented in a language 29. Asset Performance Management (APM)—condition that supports reasoning and inference. The past knowledge monitoring, predictive forecasting, and reliability- about system failures, which is implicitly contained in centered maintenance of systems based on data capture documents, can be made explicit for use in information and analytics. systems by an inference engine. Using non-deductive 30. Accelerated Life Testing (ALT)—subject systems to inference rules, the scope of making implicit knowledge stress, strain, temperatures, voltage, vibration rate, explicit, can be expanded significantly. By using an pressure, etc., in excess of their service levels for ontology-based information system, as part of the closed- quickly uncovering potential modes of failure. loop between system failure and performance issue In the Table I, each of the thirty tools described above occurrences, and the upstream design activity, new and fast have been mapped to the thirteen technical processes of learning can be enabled to ensure that the risk of systems engineering where they are likely to be most overlooking failure modes is mitigated. beneficial. The purpose is to present a connection between Although an ontology, as a way of converting implicit the tools used in reliability engineering and design-for-six- system failure knowledge into machine-readable explicit sigma, and the main technical processes of systems knowledge for reuse has often been mentioned in technical engineering that needs to be accomplished through literature, it is not currently offered commercially by enterprise level software systems integration. software providers either as part of systems engineering or reliability engineering tool suites. In view of the FAILURE KNOWLEDGE CAPTURE AND REUSE increasingly complex smart, connected products that are Systems engineering through its technical processes has emerging, the lack of understanding about their failure traditionally been applied for realizing highly complex modes will only increase. In addition, the market pressures systems where chances for unreliability of designs abound. on time-to-launch and product costs will require much faster Systems engineering begins with the discovery of the real learning cycles than ever before regarding product problems that need to be solved and identification of highest performance and product failure. Consequently, ontology- impact failures that can occur, and then, through an based or similar knowledge reuse tools are needed at the interdisciplinary approach to engineering, attempts to find enterprise level, in earnest, to deliver complex products that solutions to those problems and potential failures. However, are dependable, affordable, and available on time. many of the systems dependability issues that are likely to arise have their roots at the intersection of different LEARNING SYSTEMS BASED DESIGN-FOR- disciplines of engineering and at the interfaces between RELIABILITY different subsystems where engineering intuition tends to be The technical processes of systems engineering ensure low and rapid learning is imperative. robust capture of stakeholders’ requirements in the The rapid learning, which needs to happen over beginning, followed by a hierarchical flow of those verification and validation cycles of systems engineering or requirements into system, subsystem, and component design. over newer instances of products as they evolve from past Verification and validation activities occur at all three levels designs and configurations, deals with implicit knowledge of the system development hierarchy, and they offer learning which is not immediately accessible; and in particular cannot opportunities about the system’s performance at different be acquired from conventional databases. Further, the useful levels. knowledge is obfuscated by different subject matter experts using different terms when addressing the same issue. For Bridging Reliability Engineering & Systems Engineering Page 4 of 6 Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) Analytic Hierarchy Process (AHP) Models Multi-Attribute Untility Analysis (MAUA) Asset Performance Management Reliability Engineering Tools Reliability Block Diagrams (RBD) System Maintainability Analysis Functional Flow Block Diagram Design of Experiments (DOE) System Availability Analysis Response Surface Analysis Physics of Failure Analysis Event Tree Analysis (ETA) Monte Carlo Simulations Fault Tree Analysis (FTA) Kepner-Tregoe Analysis Accelerated Life Testing Robust Optimization Specification Tree Affinity Diagrams Conjoint Analysis SysML Diagrams Markov Analysis Weibull Analysis Kano Analysis QFD (HOQ) N2 Charts FRACAS FMECA CAPA IDEF TRIZ Systems Engineering Technical 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Processes Stakeholders' Requirements Definition 1 System Requirements Definition 2 System Architectural Design 3 System Elements Definition 4 System Analysis 5 System Elements Realization 6 System Elements Integration 7 System Design Verification 8 Verified System Transition 9 System Performance Validation 10 System Operation 11 System Maintenance 12 System Disposal 13 Table I Relating Reliability Engineering Tools with Systems Engineering Technical Processes (Copyright © 2016 by CIMdata, Inc., used with permission) The reliability engineering tools depend on the technical Software solutions providers today offer enterprise expertise of subject matter experts to develop robust and solutions to enable large parts of systems engineering as optimum designs that meet the stakeholders’ requirements described by INCOSE [4]. However, those systems and do not run the risk of customer annoyance or unsafe engineering solutions do not communicate well with outcomes. To design against potential performance issues reliability engineering and design-for-six-sigma tools. Too and failure modes, the subject matter experts need to be able much manual transfer of information is needed between the to identify those failure modes and be knowledgeable about systems engineering and the reliability engineering the potential for their occurrence. solutions, leaving the door open for many quality and Given the runaway complexity of products today which reliability problems to be missed, allowing them to reemerge often straddle two or more traditional areas such as during service and operations. In many situations, product automotive, entertainment, information, banking, etc., major developers and manufacturers have had to custom-develop problems can occur when the intuition of the subject matter the connection between reliability engineering and systems experts at the intersections of these domains is lacking. In engineering. fact, we can say that the systems could become so complex In order to compensate for the limited intuition of subject and interrelated that the so-called subject matter experts are matter experts regarding potential failure modes of complex, not really subject matter experts any more. The speed of software-intensive products, while remaining competitive in evolution of products demands a “learning system based cost and time-to-market, we need a seamless coupling design for reliability” capability in which systems between reliability engineering tools and systems engineering, as we know it, is seamlessly coupled with engineering processes that leverages ontology-based or reliability engineering and design-for-six-sigma, supported similar failure knowledge capture and reuse in order to by an ontology-based or similar knowledge management realize rapid, enterprise level learning. system that can manage implicit knowledge by converting it into machine-readable explicit knowledge. Bridging Reliability Engineering & Systems Engineering Page 5 of 6 Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS) CONCLUSION workshop in 2017 where different viewpoints will be The need for seamlessly connecting systems engineering presented on this subject by the experts from industry, solutions and reliability engineering solutions through a software solutions providers, and system integrators, knowledge management system has received very little culminating in the strategic direction of proceeding further attention. With the growing complexity of software- within the framework of OSLC or some other framework. intensive products, whose failure modes are difficult to guess a priori, it is imperative to establish such a connection REFERENCES in order to realize the “learning system based design-for- [1] Simone, Lisa. K., “Software-Related Recalls: An reliability” capability. Given the expansiveness of the Analysis of Records”, Biomedical Instrumentation & solution that could satisfy such a need, it appears that the Technology, pp. 514-522, November/December 2013. global information technology system integrators could [2] Edwards, Steve, “The Tech Effect: Examining the Trend consider this as a strategic business opportunity. They could of Software-Related Recalls”, Stericycles Thought drive the interoperability standards between diverse systems Leadership Blog.” engineering, reliability engineering, and knowledge management tools and develop integration offerings that can http://www.stericycleexpertsolutions.com/the-tech-effect- be tailored to the specific needs of diverse industry verticals examining-the-trend-of-software-related-recalls/ and businesses of different sizes within those verticals. [3] Reed, Jake, “The “Softer” Side of Things – The Impact One opportunity for driving the interoperability standards of the Increasing Software and Complexity on Vehicle between different systems engineering, reliability Defects”, SRR Automotive Warranty & Recall Blog, engineering, and knowledge management tools could be July 29, 2015. through a user group under the Open Services for Lifecycle http://blog.srr.com/automotive-warranty-and-recall/the- Collaboration (OSLC) [7]. Currently a user group for softer-side-of-things-the-impact-of-the-increasing- Quality Management [8] is working under OSLC towards software-and-complexity-on-vehicle-defects/ defining a common set of resources, formats and RESTful [4] INCOSE, “Systems Engineering Handbook – A Guide services for quality management tools to interact with other for System Life Cycle Processes and Activities”, 4th Ed., application lifecycle management (ALM) tools. No user groups for reliability engineering and/or Wiley, INCOSE-TP-2003-002-04, 2015. knowledge management exist under OSLC today. However, [5] Maass, McNair and Patricia, “Applying Design for Six that can be remedied through CIMdata’s leadership, once Sigma to Software and Hardware Systems”, Prentice experts from industry, software solutions providers, and Hall, 1st Ed., 2009. system integrators agree with CIMdata that the “learning [6] Creveling, C. M., Slutsky, J. L., and Antis, Jr. “Design system based design-for-reliability” capability must be for Six Sigma in Technology and Product Development”, developed in the near future. Prentice Hall, 2003. For socializing the contents of this whitepaper with the [7] Open Services for Lifecycle Collaboration (OSLC) experts from industry, software solutions providers, and http://open-services.net/ systems integrators, CIMdata will organize a series of webinars in 2016 which will go into the details of several [8] Quality Management User Group at OSLC aspects of this topic. CIMdata also plans to organize a http://open-services.net/workgroups/quality-management/ Bridging Reliability Engineering & Systems Engineering Page 6 of 6