2016 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM
SYSTEMS ENGINEERING (SE) TECHNICAL SESSION
AUGUST 2-4, 2016 – NOVI, MICHIGAN
BRIDGING RELIABILITY ENGINEERING AND SYSTEMS ENGINEERING
Venkatesh Agaram
Quality & Reliability Engineering Practice
CIMdata Inc, Ann Arbor, MI
ABSTRACT
The increasing application of sensors, actuators, and complex algorithms for delivering artificial
intelligence and connectivity in products and product-systems will drive an unprecedented growth in design
complexity and software content, making it increasingly more difficult to ensure dependability in an economical
manner. Much learning about the dependability of such new and innovative products is likely to happen as they
are conceived and designed. Consequently, accelerated verification and validation iterations supported by easy
and rapid storage and retrieval of failure knowledge must be enabled. No single software solutions provider
effectively covers all three critical areas required for developing and delivering dependable smart connected
products, namely, reliability engineering, systems engineering, and failure knowledge management. This paper
mainly presents a potential map of the commonly used reliability engineering tools overlaid on the systems
engineering technical processes. The paper recommends including a formal knowledge storage and retrieval
system in the closed-loop between systems engineering and reliability engineering so that the details observed in
past failures are not missed in future design iterations.
INTRODUCTION devices industry have gone up from 14% in 2005 to 25% in
Connectivity and artificial intelligence are major features 2011. This percentage has been trending upwards since
of many upcoming products that are increasingly likely to be 1983, with software-related recalls as a percentage of overall
systems-of-systems. A large set of complex algorithms is recalls averaging 6% between 1983 and 1991, 8% between
needed for estimating accurately the instantaneous states of 1992 and 1998, 11% between 1999 and 2004, and 19%
these systems and their operational environments and for between 2005 and 2011.
exercising robust control over the systems to deliver the The relationship between the number of recall events in a
benefits desired by the end users. Intricate algorithms are period of time and the number of units impacted by the
used for sensor data fusion, remote diagnostics, remote recalls can be vastly different between industries [2]. For
repair, autonomous control, timely hand-off to humans, and example, 150 recall events in the medical device industry
many other functions. The increasing application of sensors, may amount to 300,000 affected units but in the automotive
actuators, and the associated algorithms in products and industry, 30 recall events could impact 2 million vehicles.
product-systems will drive an unprecedented growth in A financial advisory blog [3] mentions that there has been
design complexity and software content, making it a substantial increase in software-related recalls in the
increasingly more difficult to ensure dependability in an automotive industry since 2012. The authors cite 32
economical manner. software-related recalls that affected 3.6 million light
Even without connectivity and artificial intelligence, the vehicles between 2005 and 2012. However, they mention
launch delays and recalls associated with today’s less 6.4 million additional vehicles affected by 63 additional
sophisticated electronically controlled mechanical systems, software-related recalls between 2013 and 2015. The blog
due to performance issues, can very often be traced to design also mentions going from 0.3% of recalls being software-
complexity and software-content. In case of connected, related in 2005 to 4.3% of recalls being software-related
automated or autonomous systems, the problem can be within the first 6 months of 2015, and this trend the authors
expected to worsen. state, is showing no signs of reversing. The authors of the
A US FDA study [1] has determined that software-related blog also report a similar trend seen in NHTSA’s complaint
recalls measured as a percentage of all recalls in the medical data. Over the period covering 2005 to 2009, 55 software-
Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)
related complaints were logged with NHTSA, whereas, over applicable interoperability standards so that the enterprise
the period 2010 to 2014, 197 complaints contained the same level system integrators can seamlessly connect the tools
reference to software related-issues, highlighting the needed for reliability engineering, systems engineering, and
increased role of software in automotive safety. knowledge management.
Notable software-related issues in recent times, from the This paper mainly presents a potential map of the
aerospace industry have been in connection with the Boeing commonly used reliability engineering tools overlaid on the
787 and the F-35 Joint Strike Fighter. A software bug in the systems engineering technical processes. The paper adheres
Boeing 787 was found to be capable of shutting down the to the technical process of systems engineering described by
plane’s electric generators every 248 days because a INCOSE [4] and the commonly known tools and processes
software counter, internal to the generator control units used in design-for-six-sigma and reliability engineering [5,
(GCUs) could overflow after 248 days of continuous power. 6]. The paper recommends including a formal knowledge
This could cause the GCU to go into failsafe mode, resulting storage and retrieval system in the closed-loop between
in a loss of all electrical power regardless of the flight phase. systems engineering and reliability engineering so that the
The F-35 Joint Strike Fighter is expected to be further details observed in past failures are not missed in future
behind in its combat-readiness due to issues with its design iterations.
RADAR software and vulnerability to cyber-attacks, and
these require the system to be rebooted every four hours of RELIABILITY ENGINEERING MEETS SYSTEMS
flight time while the desired reboot interval of the F-35 is ENGINEERING
eight to ten hours of flight time. To enable fast learning cycles that will help identify
The failure modes of software-intensive, control systems potential failure modes of complex systems, a seamless
driven products are difficult to guess a priori due to the enterprise level connection between the systems engineering
complexity of their functions and information flow, and technical processes, the reliability engineering tools, and a
consequently, crucial failure modes can easily be missed. knowledge management system is needed, so that efficient
Much learning about the dependability of such new and storage and retrieval of failure modes information is
innovative products is likely to happen as they are conceived possible.
and designed. Consequently, accelerated verification and To accomplish the development of the enterprise level
validation iterations supported by easy and rapid storage and connection mentioned above, the activities related to the
retrieval of failure knowledge must be enabled. thirteen technical processes of systems engineering [4] have
Today, enterprise level reliability engineering tools and been considered in this work as a higher-level product
systems engineering are not well-connected, preventing lifecycle structure. Those thirteen systems engineering
many lessons learned in reliability engineering from helping technical processes are:
drive robust designs via systems engineering. This situation • Stakeholders’ Requirements Identification
is further aggravated when we consider the silos of expertise • System Requirements Definition
and data among mechanical, electrical, and software • System Architectural Design
disciplines. The product development tools used in these • System Elements Definition
disciplines’ silos are very different and most often not • System Analysis
connected with each other, rendering the ability to carry out • System Elements Realization
systems engineering very difficult. A major challenge arises • System Elements Integration
due to improper channeling of prior experience and • System Design Verification
knowledge about reliability into design, leading to repeated • Verified System Transition
dependability issues of complex products. • System Performance Validation
No single software solutions provider effectively covers all • System Operation
three critical areas required for developing and delivering • System Maintenance
dependable smart connected products, namely, reliability • System Disposal
engineering, systems engineering, and knowledge In the following section the diverse tools used in design-
management. Further, given the complexity of the for-six-sigma and reliability engineering that should support
development of the large number of specialty tools required the systems engineering technical processes are briefly
to do this, it is perhaps unrealistic to expect that a described.
completely integrated suite of solutions will be available Beyond the next section, each activity associated with the
from a single software solutions provider. thirteen technical processes of systems engineering are
The most practical solution for developing dependable, associated with a set of design-for-six-sigma and reliability
connected, and intelligent products could come through the engineering tools, wherever those tools are likely to have a
application of specialty software tools that conform to beneficial impact. The purpose is to present a connection
Bridging Reliability Engineering & Systems Engineering
Page 2 of 6
Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)
between the tools used in reliability engineering and design- lead to failure, to predict reliability and improve product
for-six-sigma, and the main technical process activities of performance.
systems engineering that need to be accomplished by 12. Robust Optimization—optimization of system
enterprise level software systems integration so as to performance while increasing the stability of the system
adequately deal with the complexity of designing and against product, process, and usage variability.
delivering smart connected products. 13. Design of Experiments (DOE)—appropriate choice of
discrete values of independent variables for finding the
Reliability Engineering Tools corresponding values of the chosen dependent variables
A wide variety of tools are used in reliability engineering to understand the system response.
and design-for-six-sigma. These help prevent failures and 14. Response Surface Analysis—expressing the response of
increase the operational life of systems, subsystems, and complex systems as a continuous function of chosen
components. independent variables to cost-effectively generate
1. Affinity Diagrams (KJ Analysis)—for clustering similar insights about systems’ behavior.
items to get a higher level view of a large number of 15. Monte Carlo Simulations—random sampling of
entities being analyzed, e.g., VOC. independent variables and the generation of the
2. Quality Function Deployment (QFD) or House of corresponding independent variable values to
Quality (HOQ)—meant to develop the interpretation understand the behavior of the dependent variables.
matrix for creating functional requirements that will 16. Conjoint Analysis—ranking of stakeholders’
satisfy the stakeholder requirements—serves as a requirements based on combinations of attributes and
traceability matrix. weights representing the relative importance of the
3. Kano Analysis—way of classifying requirements into attributes.
“delighters,” “performance needs,” and “basic needs” 17. Kepner-Tregoe Analysis (KTA)—systematic way of
for rank ordering the stakeholder requirements based on decision making based on wants and must haves,
their importance. possible alternative solutions, probability of adverse
4. Functional Flow Block Diagram (FFBD)—multi-tier, effects, and significance.
time-sequenced, step-by-step flow diagram of a 18. Analytic Hierarchy Process models (AHP)—modeling
system’s functional flow. the problem as a hierarchy containing the decision goal,
5. SysML Diagrams—representation of different aspects the alternatives for reaching it, and the criteria for
of systems through diagrams, namely, requirements, evaluating the alternatives.
activity, block definition, internal block definition, 19. Multi-Attribute Utility Analysis (MAUA)—a scalar
parametric representation, use cases, system state, and utility function over the domain of the attribute values
sequence. to assess tradeoffs between different attributes.
6. Integrated Definition for Functional Modeling Diagrams 20. Fault Tree Analysis (FTA)—top down, deductive
(IDEF0)—designed to model the decisions, actions, and failure analysis in which system failure is analyzed
activities of a system. using Boolean logic to combine a set of lower-level
7. N2 Charts—matrix, representing functional or physical events.
interfaces between system elements, also applicable to 21. Event Tree Analysis (ETA)—bottom up modeling
hardware and/or software interfaces. technique for success and failure which explores
8. Specification Tree—specifications of a technical system responses through a single initiating event and helps
under development in a hierarchical order going from assess overall probabilistic system response.
system requirements, to system design specifications, to 22. Reliability Block Diagrams (RBD—for showing how
subsystem specifications, to assembly specifications, components and subsystems contribute to the success or
and to component specifications. failure of a complex system.
9. Failure Modes Effects & Criticality Analysis 23. Failure Reporting Analysis & Corrective Action System
(FMECA)—bottom-up, inductive analysis performed at (FRACAS)—for reporting, classifying, and analyzing
the functional or component level, enhanced by the failures, and planning corrective actions in response to
relationship between the probabilities of occurrence of failures.
failure modes against the severity of their consequences. 24. Corrective Action & Preventive Action (CAPA)—
10. TRIZ: Theory of Inventive Problem Solving—problem systematic investigation of the root causes of system
solving, analysis and forecasting derived from the study problems to correct the situation or prevent it from
of patterns of invention in the global patent literature. occurring.
11. Physics of Failure Analysis—modeling and simulation 25. Markov Analysis—breaking the final (or failed) system
based understanding of processes and mechanisms that state into a number of intermediate states, connected
Bridging Reliability Engineering & Systems Engineering
Page 3 of 6
Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)
with each other by transition matrices, under the instance, implicit knowledge about failure modes exists in
assumption that each state is memory-less. natural language, and consequently, the meaning depends on
26. Weibull Analysis—makes predictions about the life of interpretations by the team which is reusing that knowledge.
products by fitting statistical distributions to product life The problem of reusing pre-existing knowledge about
data (performance over product lifetime). failure modes could be solved effectively through the
27. System Maintainability Analysis—determines the ease definition of an ontology, which enables a common
and speed with which a system can be restored to understanding of the domain specific concepts without need
operational status after a failure occurs. It is calculated for interpretation, while making the ontology-held
based on time-to-repair as the random variable instead knowledge explicit and machine-readable. An ontology
of time-to-failure for estimating system reliability. helps to integrate the elements of task-relevant knowledge
28. System Availability Analysis—probability of a unit by uniformly structuring the domain knowledge.
being available in a fully functional state, calculated An ontology, which consists of definitions of concepts,
based on mission duration and observed or simulated relationships, and rules, is used in knowledge-based systems
mission downtime. where formalized knowledge is represented in a language
29. Asset Performance Management (APM)—condition that supports reasoning and inference. The past knowledge
monitoring, predictive forecasting, and reliability- about system failures, which is implicitly contained in
centered maintenance of systems based on data capture documents, can be made explicit for use in information
and analytics. systems by an inference engine. Using non-deductive
30. Accelerated Life Testing (ALT)—subject systems to inference rules, the scope of making implicit knowledge
stress, strain, temperatures, voltage, vibration rate, explicit, can be expanded significantly. By using an
pressure, etc., in excess of their service levels for ontology-based information system, as part of the closed-
quickly uncovering potential modes of failure. loop between system failure and performance issue
In the Table I, each of the thirty tools described above occurrences, and the upstream design activity, new and fast
have been mapped to the thirteen technical processes of learning can be enabled to ensure that the risk of
systems engineering where they are likely to be most overlooking failure modes is mitigated.
beneficial. The purpose is to present a connection between Although an ontology, as a way of converting implicit
the tools used in reliability engineering and design-for-six- system failure knowledge into machine-readable explicit
sigma, and the main technical processes of systems knowledge for reuse has often been mentioned in technical
engineering that needs to be accomplished through literature, it is not currently offered commercially by
enterprise level software systems integration. software providers either as part of systems engineering or
reliability engineering tool suites. In view of the
FAILURE KNOWLEDGE CAPTURE AND REUSE increasingly complex smart, connected products that are
Systems engineering through its technical processes has emerging, the lack of understanding about their failure
traditionally been applied for realizing highly complex modes will only increase. In addition, the market pressures
systems where chances for unreliability of designs abound. on time-to-launch and product costs will require much faster
Systems engineering begins with the discovery of the real learning cycles than ever before regarding product
problems that need to be solved and identification of highest performance and product failure. Consequently, ontology-
impact failures that can occur, and then, through an based or similar knowledge reuse tools are needed at the
interdisciplinary approach to engineering, attempts to find enterprise level, in earnest, to deliver complex products that
solutions to those problems and potential failures. However, are dependable, affordable, and available on time.
many of the systems dependability issues that are likely to
arise have their roots at the intersection of different LEARNING SYSTEMS BASED DESIGN-FOR-
disciplines of engineering and at the interfaces between RELIABILITY
different subsystems where engineering intuition tends to be The technical processes of systems engineering ensure
low and rapid learning is imperative. robust capture of stakeholders’ requirements in the
The rapid learning, which needs to happen over beginning, followed by a hierarchical flow of those
verification and validation cycles of systems engineering or requirements into system, subsystem, and component design.
over newer instances of products as they evolve from past Verification and validation activities occur at all three levels
designs and configurations, deals with implicit knowledge of the system development hierarchy, and they offer learning
which is not immediately accessible; and in particular cannot opportunities about the system’s performance at different
be acquired from conventional databases. Further, the useful levels.
knowledge is obfuscated by different subject matter experts
using different terms when addressing the same issue. For
Bridging Reliability Engineering & Systems Engineering
Page 4 of 6
Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)
Analytic Hierarchy Process (AHP) Models
Multi-Attribute Untility Analysis (MAUA)
Asset Performance Management
Reliability Engineering Tools
Reliability Block Diagrams (RBD)
System Maintainability Analysis
Functional Flow Block Diagram
Design of Experiments (DOE)
System Availability Analysis
Response Surface Analysis
Physics of Failure Analysis
Event Tree Analysis (ETA)
Monte Carlo Simulations
Fault Tree Analysis (FTA)
Kepner-Tregoe Analysis
Accelerated Life Testing
Robust Optimization
Specification Tree
Affinity Diagrams
Conjoint Analysis
SysML Diagrams
Markov Analysis
Weibull Analysis
Kano Analysis
QFD (HOQ)
N2 Charts
FRACAS
FMECA
CAPA
IDEF
TRIZ
Systems Engineering Technical
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Processes
Stakeholders' Requirements Definition 1
System Requirements Definition 2
System Architectural Design 3
System Elements Definition 4
System Analysis 5
System Elements Realization 6
System Elements Integration 7
System Design Verification 8
Verified System Transition 9
System Performance Validation 10
System Operation 11
System Maintenance 12
System Disposal 13
Table I Relating Reliability Engineering Tools with Systems Engineering Technical Processes
(Copyright © 2016 by CIMdata, Inc., used with permission)
The reliability engineering tools depend on the technical Software solutions providers today offer enterprise
expertise of subject matter experts to develop robust and solutions to enable large parts of systems engineering as
optimum designs that meet the stakeholders’ requirements described by INCOSE [4]. However, those systems
and do not run the risk of customer annoyance or unsafe engineering solutions do not communicate well with
outcomes. To design against potential performance issues reliability engineering and design-for-six-sigma tools. Too
and failure modes, the subject matter experts need to be able much manual transfer of information is needed between the
to identify those failure modes and be knowledgeable about systems engineering and the reliability engineering
the potential for their occurrence. solutions, leaving the door open for many quality and
Given the runaway complexity of products today which reliability problems to be missed, allowing them to reemerge
often straddle two or more traditional areas such as during service and operations. In many situations, product
automotive, entertainment, information, banking, etc., major developers and manufacturers have had to custom-develop
problems can occur when the intuition of the subject matter the connection between reliability engineering and systems
experts at the intersections of these domains is lacking. In engineering.
fact, we can say that the systems could become so complex In order to compensate for the limited intuition of subject
and interrelated that the so-called subject matter experts are matter experts regarding potential failure modes of complex,
not really subject matter experts any more. The speed of software-intensive products, while remaining competitive in
evolution of products demands a “learning system based cost and time-to-market, we need a seamless coupling
design for reliability” capability in which systems between reliability engineering tools and systems
engineering, as we know it, is seamlessly coupled with engineering processes that leverages ontology-based or
reliability engineering and design-for-six-sigma, supported similar failure knowledge capture and reuse in order to
by an ontology-based or similar knowledge management realize rapid, enterprise level learning.
system that can manage implicit knowledge by converting it
into machine-readable explicit knowledge.
Bridging Reliability Engineering & Systems Engineering
Page 5 of 6
Proceedings of the 2016 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)
CONCLUSION workshop in 2017 where different viewpoints will be
The need for seamlessly connecting systems engineering presented on this subject by the experts from industry,
solutions and reliability engineering solutions through a software solutions providers, and system integrators,
knowledge management system has received very little culminating in the strategic direction of proceeding further
attention. With the growing complexity of software- within the framework of OSLC or some other framework.
intensive products, whose failure modes are difficult to
guess a priori, it is imperative to establish such a connection REFERENCES
in order to realize the “learning system based design-for- [1] Simone, Lisa. K., “Software-Related Recalls: An
reliability” capability. Given the expansiveness of the Analysis of Records”, Biomedical Instrumentation &
solution that could satisfy such a need, it appears that the Technology, pp. 514-522, November/December 2013.
global information technology system integrators could
[2] Edwards, Steve, “The Tech Effect: Examining the Trend
consider this as a strategic business opportunity. They could
of Software-Related Recalls”, Stericycles Thought
drive the interoperability standards between diverse systems
Leadership Blog.”
engineering, reliability engineering, and knowledge
management tools and develop integration offerings that can http://www.stericycleexpertsolutions.com/the-tech-effect-
be tailored to the specific needs of diverse industry verticals examining-the-trend-of-software-related-recalls/
and businesses of different sizes within those verticals. [3] Reed, Jake, “The “Softer” Side of Things – The Impact
One opportunity for driving the interoperability standards of the Increasing Software and Complexity on Vehicle
between different systems engineering, reliability Defects”, SRR Automotive Warranty & Recall Blog,
engineering, and knowledge management tools could be July 29, 2015.
through a user group under the Open Services for Lifecycle http://blog.srr.com/automotive-warranty-and-recall/the-
Collaboration (OSLC) [7]. Currently a user group for softer-side-of-things-the-impact-of-the-increasing-
Quality Management [8] is working under OSLC towards software-and-complexity-on-vehicle-defects/
defining a common set of resources, formats and RESTful
[4] INCOSE, “Systems Engineering Handbook – A Guide
services for quality management tools to interact with other
for System Life Cycle Processes and Activities”, 4th Ed.,
application lifecycle management (ALM) tools.
No user groups for reliability engineering and/or Wiley, INCOSE-TP-2003-002-04, 2015.
knowledge management exist under OSLC today. However, [5] Maass, McNair and Patricia, “Applying Design for Six
that can be remedied through CIMdata’s leadership, once Sigma to Software and Hardware Systems”, Prentice
experts from industry, software solutions providers, and Hall, 1st Ed., 2009.
system integrators agree with CIMdata that the “learning [6] Creveling, C. M., Slutsky, J. L., and Antis, Jr. “Design
system based design-for-reliability” capability must be for Six Sigma in Technology and Product Development”,
developed in the near future. Prentice Hall, 2003.
For socializing the contents of this whitepaper with the [7] Open Services for Lifecycle Collaboration (OSLC)
experts from industry, software solutions providers, and
http://open-services.net/
systems integrators, CIMdata will organize a series of
webinars in 2016 which will go into the details of several [8] Quality Management User Group at OSLC
aspects of this topic. CIMdata also plans to organize a http://open-services.net/workgroups/quality-management/
Bridging Reliability Engineering & Systems Engineering
Page 6 of 6