có decision tree
có decision tree
có decision tree
1. Introduction
Nowadays, manufacturing enterprises have to stay competitive in order to survive the
competition in the global market. Quality, cost and cycle time are considered as decisive
factors when a manufacturing enterprise competes against its peers. Among them, quality is
viewed as the more critical for getting long-term competitive advantages. The development
of information technology and sensor technology has enabled large-scale data collection
when monitoring the manufacturing processes. Those data could be potentially useful when
learning patterns and knowledge for the purpose of quality improvement in manufacturing
processes. However, due to the large amount of data, it can be difficult to discover the
knowledge hidden in the data without proper tools.
Data mining provides a set of techniques to study patterns in data “that can be sought
Open Access Database www.intechweb.org
automatically, identified, validated, and used for prediction” (Witten and Frank 2005).
Typical data mining techniques include clustering, association rule mining, classification,
and regression. In recent years data mining began to be applied to quality diagnosis and
quality improvement in complicated manufacturing processes, such as semiconductor
manufacturing and steel making. It has become an emerging topic in the field of quality
engineering. Andrew Kusiak (2001) used a decision tree algorithm to identify the cause of
soldering defects on circuit board. The rules derived from the decision tree greatly
simplified the process of quality diagnosis. Shao-Chuang Hsu (2007) and Chen-Fu Chien
(2006 and 2007) demonstrated the use of data mining on semiconductor yield improvement.
Data mining has also been applied to product development process (Bakesh Menon, 2004)
and assembly lines (Sébastien Gebus,2007). Some researchers combined data mining and
traditional statistical methods and applied to quality improvement. Examples are the use of
MSPC (multivariate statistical control charts) and neural networks in detergent-making
company (Seyed Taghi Akhavan Niaki, 2005; Tai-Yue Wang, 2002), the combination of
automated decision system and six sigma in the General Electric financial Assurance
businesses (Angie Patterson, 2005), the combined used of decision tree and SPC with data
from Holmes and Mergen (Ruey-Shiang Guh, 2008), the use of SVR (support vector
regression) and control charts (Ben Khediri ISSam, 2008), the use of ANN (artificial neural
Source: Data Mining and Knowledge Discovery in Real Life Applications, Book edited by: Julio Ponce and Adem Karahoca,
ISBN 978-3-902613-53-0, pp. 438, February 2009, I-Tech, Vienna, Austria
358 Data Mining and Knowledge Discovery in Real Life Applications
Quality Improvement using Data Mining in Manufacturing Processes 359
the knowledge has been verified, opportunities of quality improvement can be identified
using the knowledge and patterns learned by data mining techniques. The scope of the
problem can be broad across different phases of a manufacturing process. In the following
sections, we explained how to apply the model to parameter optimization, quality diagnosis
and service data analysis.
360 Data Mining and Knowledge Discovery in Real Life Applications
The mold temperature is 142.3 °C, the warm-up temperature is 65.7 °C, the screw pressure is
software. And the optimized manufacturing parameters are found using statistical methods.
Quality Improvement using Data Mining in Manufacturing Processes 361
conditions. Other uncontrollable factors may also have an effect on the quality of final
products. Thus the process should be monitored and optimized continuously using real
quality related data. Data mining can be used to serve that purpose by automatically
obtaining knowledge and patterns about the manufacturing process.
We collect 1000 records randomly from the molding processes. Each record consists of
values of the four factors. The records were classified based on their defect rates. Those
records whose defect rates were higher than 3.0% were categorized into a negative class and
labeled with L. All other records belonged to a positive class and were labeled with H. We
use the C5.0 decision tree algorithm to analyze the data. Fig. 3 shows the result of the
decision tree. We can observe that the right branch of the tree achieves a better performance
than the left branch. The path indicated by the circled nodes provides us guidance for
parameter optimization.
The decision tree shown in Fig. 3 can also be presented by a set of rules. We have identified
five rules. Two of them lead to the classification of the positive class (H) where the other
three predict the negative class (L).
1) Rule 1 for H (144; 0.973)
If Mold_Tem > 135.345 and
Warm_Tem > 66.762 and
Pressure > 21.235
Then H.
2) Rule 2 for H (127; 0.961)
If Mold_Tem <= 135.746 and
Warm_Tem > 62.106 and
Warm_Tem <= 66.762 and
Pressure > 20.733
Then H.
3) Rule 1 for L (44; 0.848)
If Mold_Tem <= 135.345 and
Warm_Tem > 66.761
Then L.
4) Rule 2 for L (390; 0.625)
If Pressure <= 21.235
Then L.
5) Rule 3 for L (790; 0.598)
If Warm_Tem <= 66.762
Then L.
Although a decision tree does not provide precisely optimized parameters like what the
RSM method does, decision tree can analyze a very large amount of quality related data
with noise which is still a constriction in DOE. The combination of these two methods can
provide more feasible results than using DOE only.
362 Data Mining and Knowledge Discovery in Real Life Applications
Quality Improvement using Data Mining in Manufacturing Processes 363
detect assignable variations in manufacturing processes, they give no clue to identifying the
root causes of the assignable variations. Data mining techniques can again be employed in
this case to provide insights for quality diagnosis.
364 Data Mining and Knowledge Discovery in Real Life Applications
can be collected off the test instrument and stored in a database. We can then analyze the
data using data mining in order to find ways to improve the quality of the manufacturing
process. A mobile phone assembly line model is presented in Fig. 5. There are m assembly
lines while products assembled in these lines are randomly distributed to n testers. The test
results are classified into two classes: Pass and Fail.
Quality Improvement using Data Mining in Manufacturing Processes 365
366 Data Mining and Knowledge Discovery in Real Life Applications
than 0.27%. Thus we can set the CL=5 as a control line of the SPC chart for the NBF data. If
there are points distributed beyond the control limits, the data mining methods will be used
for quality diagnosis.
Next an association rule mining tool is used to find the root cause of the assignable
variations of the process. Attributes such as tester ID, assembly line number, and test results
are supplied into our association rule mining tool with test results being the consequent
variable. We used the Apriori algorithm as the association rules analysis method. The
minimum antecedent support was set to 0.65% while the minimum rule confidence was
80%. The obtained association rules are presented in Table 7.
Quality Improvement using Data Mining in Manufacturing Processes 367
2. There were 4,209 mobile phones assembled by line 20. 99.81% of which failed the test
(Rule 2). Rules 3- extended Rule 2 by considering each different tester.
3. There were 3,216 mobile phones assembled by line 33, all of which failed the test (Rules
10 and 11).
4. There were 4,297 mobile phones assembled by line 36, all of which failed the test (Rules
5. There were 585 products tested by tester 104, 83.077% of which failed the test (Rule 21).
In all the 62,592 products, there were 16,905 products that failed the test. The percentage of
the failed tests was 27%. The failed products assembled by lines 15, 20, 33, 36 or tested by
tester 104 are summed up to 16,117. That means nearly 95% of the products that failed in the
tests were caused by the four assembly lines and/or tester 104. This example shows how
data mining techniques can be used to identify the root causes of quality problems in a
manufacturing process. That kind of knowledge is valuable for quality diagnosis and
quality improvement.
368 Data Mining and Knowledge Discovery in Real Life Applications
4. System framework
Based on the above discussion, we propose a system infrastructure for quality improvement
using data mining techniques (Fig. 9). There are three layers in the model, namely data
collection layer, data analysis layer and data view layer. The function of each layer is
described below.
Quality Improvement using Data Mining in Manufacturing Processes 369
370 Data Mining and Knowledge Discovery in Real Life Applications
Fig. 9. A system infrastructure for quality improvement using data mining techniques
Quality Improvement using Data Mining in Manufacturing Processes 371
preprocess raw data for data mining algorithms. Secondly, patterns and knowledge learned
by data mining techniques are not always usable. How to ascertain the usable knowledge in
a large amount rules and patterns is also a problem that deserves attention. Finally, the
learned rules and patterns have to be analyzed by domain experts with their domain
knowledge. How to present the domain knowledge and build an automated knowledge
ascertain system is also a challenging issue in this field.
6. Acknowledgement
The authors of this work would like to thank the National Natural Science Foundation of
China (NSFC) who sponsored this research (grant no. 70572044). The authors would also
like to thank the anonymous reviewers for their constructive comments on this work.
7. References
Andrew Kusiak, Christian Kurasek. Data mining of printed-circuit board defects, IEEE
transactions on robotics and automation, vol.17, No.2, 2001.
Angie Patterson, Piero Bonissone, Marc Pavese. Six sigma applied throughout the lifecycle
of an automated decision system, quality and reliability engineering international,
2005.21: 275-292
Ben Khediri Issam, Limam Mohamed. Support vector regression based residual MCUSUM
control chart for autocorrelated process, Applied mathematics and computation,
2008 (In press)
Chen-Fu Chien, Wen-Chih Wang, Jen-Chieh Cheng. Data mining for yield enhancement in
semiconductor manufacturing and an empirical study, Expert Systems with
Applications, 2007.33: 192-198
Chen-Fu Chien, Huan-Chung Li and Angus Jeang. Data mining for improving the solder
bumping process in the semiconductor packaging industry, Intelligent systems in
accounting, finance and management, 2006.14: 43-57
Edgar F. Codd, A Relational Model of Data for Large Shared Data Banks, Communications
ofthe ACM 13(6): 377-387.
Giovanni C Porzio, and Giancarlo Ragozini. Visually mining off-line data for quality
improvement, Quality and reliability engineering international, 2003.19:273-283
Hsu-Hwa Chang. A data mining approach to dynamic multiple responses in Taguchi
experimental design, Expert systems with applications, 2008.35: 1095-1103
Ian H Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and
Techniques, Morgan Kaufmann Publishers, 2005.
J A McCarty, Manoj Hastak. Segmentation approaches in data-mining: a comparison of
RFM, CHAID, and logistic regression, Journal of Business Research, 2007.60: 656-
Kaidi Zhao, Bing Liu, Tomas M Tirpak and Weimin Xiao. A visual data mining frame work
for convenient identification of useful knowledge, proceedings of the fifth IEEE
international conference on data mining, 2005.
Myers R H, Montgomery D C. Response surface methodology. John Wiley & Sons, New
York, 1985.
Mu-Chen Chen. Ranking discovered rules from data mining with multiple criteria by data
envelopment analysis, Expert systems with application, 2007.33:1110-1116
372 Data Mining and Knowledge Discovery in Real Life Applications
Rakesh Menon, Loh Han Tong, S Sathiyakeerthi, Aarnout Brombacher and Christopher
Leong. The needs and benefits of applying textual data mining within the product
development process, Quality and reliability engineering international, 2004.21:1-15
Ruey-Shiang Guh, Yeou-Ren Shiue. An effective application of decision tree learning for on-
line detection of mean shifts in multivariate control charts, Computers & Industrial
Engineering, 2008 (In Press)
Seyed Taghi Akhavan Niaki, Babak Abbasi. Fault diagnosis in multivariate control charts
using artificial neural networks, Quality and reliability engineering international,
2005.21: 825-840
Sébastien Gebus, Kauko Leiviskä. Knowledge acquisition for decision support systems on
an electronic assembly line, Expert systems with applications, 2007 (In press).
Shao-Chuang Hsu, Chen-Fu Chien. Hybrid data mining approach for pattern extraction
from wafer bin map to improve yield in semiconductor manufacturing, Int J
Production Economics, 2007.107: 88-103
Tai-Yue Wang, Long-hui Chen. Mean shifts detection and classification in multivariate
process: a neural-fuzzy approach, Journal of intelligent manufacturing, 2002.12:
Data Mining and Knowledge Discovery in Real Life Applications
Edited by Julio Ponce and Adem Karahoca
ISBN 978-3-902613-53-0
Hard cover, 436 pages
Publisher I-Tech Education and Publishing
Published online 01, January, 2009
Published in print edition January, 2009
This book presents four different ways of theoretical and practical advances and applications of data mining in
different promising areas like Industrialist, Biological, and Social. Twenty six chapters cover different special
topics with proposed novel ideas. Each chapter gives an overview of the subjects and some of the chapters
have cases with offered data mining solutions. We hope that this book will be a useful aid in showing a right
way for the students, researchers and practitioners in their studies.
How to reference
In order to correctly reference this scholarly work, feel free to copy and paste the following:
Shu-guang He, Zhen He, G. Alan Wang and Li Li (2009). Quality Improvement using Data Mining in
Manufacturing Processes, Data Mining and Knowledge Discovery in Real Life Applications, Julio Ponce and
Adem Karahoca (Ed.), ISBN: 978-3-902613-53-0, InTech, Available from: