Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering
Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering
Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering
2, DECEMBER 2017
Abstract—There are many approaches to improve the forecasted accuracy of model based on fuzzy time series such as:
determining the optimal interval length, establishing fuzzy logic relationship groups, similarity measures, wherein, the length of
intervals is a factor that greatly affects forecasting results in fuzzy time series model. In this paper, a new forecasting model
based on combining the fuzzy time series (FTS) and K-mean clustering algorithm with three computational methods, K-means
clustering technique, the time - variant fuzzy logical relationship groups and defuzzification forecasting rules, is presented.
Firstly, we apply the K-mean clustering algorithm to divide the historical data into clusters and tune them into intervals with
proper lengths. Then, based on the new intervals obtained, the proposed method is used to fuzzify all the historical data and
create the time -variant fuzzy logical relationship groups based on the new concept of time variant fuzzy logical relationship
group. Finally, Calculate the forecasted output value by the improved defuzzification technique in the stage of defuzzification.
To evaluate performance of the proposed model, two numerical data sets are utilized to illustrate the proposed method and
compare the forecasting accuracy with existing methods. The results show that the proposed model gets a higher average
forecasting accuracy rate to forecast the Taiwan futures exchange (TAIFEX) and enrollments of the University of Alabama than
the existing methods based on the first order and high-order fuzzy time series.
Index Terms—Forecasting, fuzzy time series, fuzzy logical relationship groups, K-mean clustering, defuzzification rules
enrollments, TAIFEX.
1. Introduction
with PSO algorithm for forecasting in fuzzy time series using clustering algorithms [20], [21] and [22] in the
model. N. Van Tinh and N.C Dieu [18] extended our way where the fuzzy logical relationship groups and
previous work [17] to a high-order fuzzy time series forecasted rules are created. In case study, the proposed
model to forecast stock market indices of TAIFEX. Some model was applied to forecast the enrollments of the
other techniques for determining best intervals and University of Alabama and the TAIFEX. The experi-
interval lengths based on clustering techniques such mental results showed that the proposed method gets
as; automatic clustering techniques are found [19], the a higher average forecasting accuracy compared to the
K-means clustering combining the FTS in [20] and the existing methods. In addition, the empirical results also
fuzzy c-means clustering in [21]. Another way, a high- showed that the high-order FTS model outperformed
order algorithm for Multi-Variable FTS [22] based on the first-order FTS model with a lower forecast error.
fuzzy clustering is presented to deal various forecasting
problems such as: enrollment forecasting, Gas forecast-
2. Fuzzy Time Series and Algorithms
ing, Rice produce prediction. As already mentioned in
researches above, the forecasting performance of the 2.1. Fuzzy Time Series
fuzzy time series model is influenced by interval length, In 1993, Song and Chissom proposed the
and determination of the appropriate interval partition- definitions of fuzzy time series, where the
ing method is supposedly a challenging task as shown values of fuzzy time series are represented
in literature [13] and some others. In spite of significant by fuzzy sets.Let U = {u1 , u2 , ..., un } be an
achievements in using the length of each interval, this universal set; a fuzzy set A of U is defined as
problem still raises researchers attention. Up to now, A = {fA (u1 )/u1 + fA (u2 )/u2 + ... + fA (un )/un },
there are still rather many ways to determine the lengths where fA is a membership function of a given set A:
of intervals in the universe of discourse. For Example, In U → [0, 1], fA (ui ) indicates the grade of membership
[27], Lizhu Wang et al. had taken the temporal informa- of ui in the fuzzy set A. fA (ui ) ∈ [0, 1], and 1 ≤ i ≤ n.
tion into account to partition the universe of discourse General definitions of fuzzy time series are given
into intervals with unequal length, Lu et al. in [28] as follows:
used information granules to partition the universe of Definition 1: Fuzzy time series
discourse into intervals by continually adjusting width Let Y (t)(t = .., 0, 1, 2..), a subset of R, be the universe
of these intervals and achieved forecasting accuracy of discourse on which fuzzy sets fi (t)(i = 1, 2) are
with different interval lengths. Next, combining the defined and if F(t) be a collection of f1 (t), f2 (t),· · ·, then
support vector machines with the PSO techniques to F(t) is called a fuzzy time series on Y (t)(t..., 0, 1, 2...).
determine optimal intervals in the universe of discourse Definition 2: Fuzzy logical relationship (FLR) [2]
and classify the training data set. Following, in [29] If there exists a fuzzy relationship R(t-1,t), such that
Chen and Kao presented a model for forecasting the F (t) = F (t − 1) ∗ R(t − 1, t), where ∗ is an maxmin
TAIEX based on FTS combining the support vector composition operator, then F(t) is said to be caused by
machines for classifying the training data set and the F(t-1). The relationship between F(t) and F(t-1) can be
PSO techniques for determining optimal intervals in the denoted by F (t − 1) → F (t). Let Ai = F(t) and Aj
universe of discourse. Based on the benefit of using PSO = F(t-1), the relationship between F(t) and F(t -1) is
techniques simultaneously, Chen et al. in [30] propose a denoted by fuzzy logical relationship Ai → Aj where
new FTS forecasting model based on optimal partitions Ai and Aj refer to the current state or the left - hand
of intervals in the universe of discourse and optimal side and the next state or the right-hand side of fuzzy
weighting vectors of two-factors second-order fuzzy- relations.
trend logical relationship groups to forecast the TAIEX Definition 3: Fuzzy logical relationship groups
and the NTD/USD exchange rates. Another approach (FLRGs) [3]
trend, in [32] presented a novel partitioning interval Fuzzy logical relationships, which have the same fuzzy
method based on hedge algebras. According to this set located in the left-hand side of the fuzzy logical
method, the number of intervals are equal to the number relationships, can be grouped into a FLRG. Suppose
of linguistic terms used to qualitatively describe the his- there are exists fuzzy logical relationships as follows:
torical values of fuzzy time series. In this paper, a new Ai → Ak1 ; Ai → Ak2 ;...; Ai → Akm ; they can be
hybrid forecasting model based on combining the K- grouped into an FLRG as : Ai → Ak1 , Ak2 , · · · , Akm .
mean clustering algorithm for partitioning the universe The repeated FLRs in the FLRGs are discarded by Chen
of discourse and the time variant fuzzy logical relation- [3], [14] and counted only once, but according to Yu
ship groups (FLRGs) is presented. Although the idea model [7], this repeated FLRs can be accepted.
of using K-means clustering algorithm for partitioning Definition 4: The - order fuzzy logical relationships [4]
historical dataset into intervals of different lengths is Let F(t)be a fuzzy time series. If F(t) is caused by
not novel as can be seen in [20], combining with the F(t-1), F(t-2),, F(t-+1) F(t-) then this fuzzy relationship is
time variant FLRGs in the determining of fuzzy logical represented by F (t − ), , F (t − 2), F (t − 1) → F (t) and
relationships stage and novel forecasted rules in the is called an - order fuzzy time series.
defuzzification stage can help to improve the forecasting Definition 5: Time-variant fuzzy relationship
result significantly. From this view point, the proposed groups [17]
method is different from the approaches which also The relationship between F(t) and F(t-1) is determined
48 JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 3, NO. 2, DECEMBER 2017
by F (t − 1) → F (t). Let F(t)= Ai (t) and F(t- [xmin , xmax ]. The outline of proposed method is pre-
1)= Aj (t − 1), we will have the relationship sented in Fig. 1, which consists of two stages; the first
Aj (t − 1) → Ai (t). At the time t, we have the stage is to partition the historical data into intervals
following fuzzy logical relationships:Aj (t − 1) → Ai (t); based on algorithm 1 and the second stage is to build
Aj (t1 − 1) → Ai1 (t1);...; Aj (tp − 1) → Aip (tp) with the forecasted model to perform prediction output. The
t1,t2,..,tp ≤ t. It is noted that Ai(t1) and Ai(t2) have two stages of forecasting model are described as follows:
the same linguistic value as Ai, but appear at different
times t1 and t2, respectively. It means that if the
fuzzy relations occurred before Aj (t − 1) → Ai (t),
we can group the fuzzy logic relationship to be
Aj (t − 1) → Ai1 (t1), Ai2 (t2), Aip (tp), Ai (t). It is called
first order time-variant fuzzy logical relationship
group.
Fig. 1: The flow chart of the proposed forecasting model
2.2. Algorithm 1: K-Means clustering algorithm
K-means clustering is one of the simplest unsuper-
vised learning algorithms introduced by MacQueen [33] 3. Forecasting model based on K -means clustering
that can solve the clustering problem [20]. K-means and FTS
clustering method groups the collected data into clus- In this section, a novel method based on combining
ters based on their closeness to each other according to the FTS and K-means clustering algorithm for forecast-
Euclidean distance. The result depends on the number ing the enrolments of University of Alabama, is pre-
of cluster. The algorithm is consists of the following sented. Firstly, K-means clustering algorithm is applied
major steps Step 1: Choose k centroids {z1 , z2 , · · · zk } to classify the collected data of enrolments into clusters
Step 2: Assign each object x to the clusters Ci : x ∈ Ci and adjusted these clusters into contiguous intervals
if d(x, zi ) ≤ d(x, zj ), j 6= i P for generating intervals from the enrolment data in
Step 3: Update {zi } to minimize Ji = x∈Ci |x − Subsection 3.1. Then, based on the defined intervals,
zi |2 , i = 1..k we fuzzify all historical enrolments data into fuzzy sets
1 P and establish time - variant FLRGs. Finally, based on
zi = (x) = mi
N ci the obtained time - variant FLRGs, we calculate the
Step 4: Reassign the objects using the new centroids
Step 5: Repeat Steps 2, 3 and 4 until the centroids no forecasting results using the proposed defuzzification
longer move. rules, shown in Subsection 3.2. To verify the effective-
ness of the proposed model, all historical enrollments
[3] (the enrollment data at the University of Alabama
2.3. Algorithm 2: The time variant FLRGs from 1971s to 1992s) are used to illustrate the first - order
Assume there are fuzzy time series F(t), t =1, 2 ,, fuzzy time series forecasting process.
q ,wherein it is presented by fuzzy sets as follows:
Ai1 , Ai2 , , Aiq . Based on the Definition 5 of the time - 3.1. The K-Mean algorithm for generating intervals from
variant FLRGs, an algorithm is proposed as follows: The historical dataset
λ - order time variant fuzzy logical relationship groups
algorithm The algorithm composed of two steps is introduced
1: initialize the λ-order time variant FLRGs t= λ ; step-by-step with the same dataset [3]:
F(1),F(2),..,F(λ -1) → F(λ) or Aj2 , , Ajλ → Ak1 (λ) Step 1: Apply the K-means clustering algorithm to
2: for t: = λ do q do partition the historical time series data into c clusters
for h: = λ down to 1 do and sort the data in clusters in an ascending sequence.
Create all λ- order FLRs Aj2 (t−λ), · · · , Ajλ (t−1) → In this paper, suppose c =14 clusters, the results of
Ak1 (t) clusters are as follows:
end for {13055}, {13563, 13867}, {14696, 15145, 15163},
3: for v: = 1 to t-1 do {15311}, {15433, 15460, 15497},{15603},
for h = 1 do v do {15861, 15984}, {16388}, {16807}, {16859}, {16919},
if there is fuzzy logical relation Aj2 · · · , Ajm → {18150}, {18970, 18876}, {19328, 19337}
Ak2 (h) at the same left - hand side, then add Ak2 into Furthermore, the number of clusters is selected by an
FLRGs as follows: Aj2 , · · · , Ajλ → AkAk1 , Ak2 any way that do not exceed the total amount of data in
end for the time series, such as c is 7, 8,9 11, ..., 22.
end for Step 2: Adjust the clusters into intervals In this step,
we use automatic clustering techniques [19] to generate
cluster center (Center k) from clusters and Adjust the
2.4. The proposed forecasting method based on com- clusters into intervals according to the following rules:
bining K-Means clustering algorithm and FTS
n
Suppose that X = {x1 , x2 , · · · , xn } is a historical 1X
Centerk = di (1)
time series data on the universe of discourse U = n i=1
Nghiem Van Tinh et al.: IMPROVING THE FORECASTED ACCURACY OF MODEL BASED ON FUZZY TIME SERIES AND K-MEANS CLUSTERING 49
where di is a datum in clusterk , n denotes the number Step 3: Define the fuzzy setsAi for observations
of data in clusterk and 1 ≤ k ≤ c. Suppose that (historical data).
Centerk and Centerk+1 are adjacent cluster centers, Each interval in Step 1 represents a linguistic variable
then the upper bound Cluster U Bk of clusterk and of enrollments. For 14 intervals, there are 14 linguistic
the lower bound cluster LBk+1 of clusterk+1 can be variables. Each linguistic variable represents a fuzzy set
calculated as follows: Ai (1 ≤ i ≤ 14). All possible values of these linguistic
variables are A1 {very very very very f ew},
Centerk + Center(k+1) A2 {very very very f ew},
Cluster U Bk = (2)
2 A3 {very very f ew} , A4 {very f ew} , A5 {f ew},
Cluster LBk+1 = Cluster U Bk (3) A6 {moderate} , A7 {many},
A8 {many many} , A9 {very many} , A10 {too many},
where k =1,.., c - 1. Because there is no previous cluster A11 {too many many} , A12 {too many many many},
before the first cluster and there is no next cluster after A13 {too many many many many} and
the last cluster, the lower bound Cluster LB1 of the A14 {too many many many many many} which
first cluster and the upper bound Cluster UBc of the can represent different intervals in the universe of
last cluster can be calculated as follows: discourse, and its definition is described according to
Cluster LB1 = Centerr1 − (Center1 − Cluster U B1 ) (7)
ai1 ai2 ai3 ai14
(4) Ai = + + + ... + (7)
u1 u2 u3 u14
Cluster U Bc = Centerc + (Centerc − Cluster LBc )
(5) Here the symbol + denotes the set union operator and
Then, assign each cluster Clusterk form an inter- the symbol / denotes the membership of uj (1 ≤ j ≤ 14)
val intervalk , which means that the upper bound which belongs to Ai , respectively. For simplicity, the
Cluster U Bk and the lower bound Cluster LBk the different membership values aij of fuzzy set Ai of
cluster clusterk are also the upper bound interval U Bk {0, 0.5 and 1} are selected to indicate the grade of
and the lower bound interval LBk of the interval membership of uj in the fuzzy set Ai . According to (7),
intervalk , respectively. Calculate the middle value a fuzzy set contains 14 intervals. Contrarily, an interval
M id valuek of the interval intervalk as follows: belongs to all fuzzy sets with different membership
degrees. For example, u1 belongs to A1 and A2 with
interval LBk + interval U Bk membership degrees of 1 and 0.5 respectively, and other
M id valuek = (6)
2 fuzzy sets with membership degree is 0.
where iinterval LBk and interval U Bk are the lower Step 4: Fuzzy all historical enrollments data In order
bound and the upper bound of the interval intervalk , to fuzzify all historical data, its necessary to assign a
respectively, with k = 1, · · · , c. Based on the rules of corresponding linguistic value to each interval first.
this step, we obtain 14 intervals corresponding to the The simplest way is to assign the linguistic value with
clusters in step 1 and calculate the middle value of the respect to the corresponding fuzzy set that each interval
intervals are listed in Table 1 belongs to with the highest membership degree. For
example, the historical enrollments of year 1972 is
TABLE 1: The intervals and their midpoints obtained by K-means
clustering technique 13563, and it belongs to interval u2 because 13563
is within u2 = [13385, 14358]. So, we then assign the
No Interval MidPoint linguistic value or the fuzzy set A2 corresponding to
1 [12725, 13385) 13055
2 [13385, 14358) 13871.5
interval u2 . In the same way, the results of fuzzification
3 [14358, 15156) 14757 on enrollments of the University of Alabama are listed
4 [15156, 15387) 15271.5 are listed in Table 2.
5 [15387, 15533) 15460
6 [15533, 15762.5) 15647.75 TABLE 2: The intervals and their midpoints obtained by K-means
7 [15762.5, 16155) 15958.75 clustering technique
8 [16155, 16597.5) 16376.25
9 [16597.5, 16833) 16715.25 Year Actual Fuzzy set
10 [16833, 16889) 16861 1971 13055 A1
11 [16889, 17534.5) 17211.75 1972 13563 A2
12 [17534.5, 18536.5) 18035.5 1973 13867 A2
13 [18536.5, 19127.5) 18832 1974 14696 A3
14 [19127.5, 19536.5) 19332 1975 15460 A5
1976 15311 A4
1977 15603 A6
1978 15861 A7
1979 16807 A9
3.2. Forecasting model based on the first - order time - 1980 16919 A11
variant FLRGs 1981 16388 A8
In this section, we present a hybrid method for —- —- —-
1990 19328 A14
forecasting enrollments based on the K-mean clustering 1991 19337 A14
algorithm and time - variant fuzzy logical relationship 1992 18876 A13
groups. The proposed method is now presented from
step 3 to step 7 as follows: Step 5: Create all λ order fuzzy logical relationships
50 JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 3, NO. 2, DECEMBER 2017
Year Actual data H01 CC06a HPSO Model [28] Model [31] Model [32] Our model
1971 13055
1972 13563 14000 13714 13555 13678 13544 13582 13547
1973 13867 14000 13714 13994 13678 13906 13582 13872
1974 14696 14000 14880 14711 14602 14683 14457 14314
1975 15460 15500 15467 15344 15498 15443 15443 15460
1976 15311 15500 15172 15411 15192 15395 15447 15348
1977 15603 16000 15467 15411 15641 15620 15447 15571
1978 15861 16000 15861 15411 15827 15919 15371 15828
1979 16807 16000 16831 16816 16744 16827 16752 16794
1980 16919 17500 17106 17140 17618 17559 17031 16997
1981 16388 16000 16380 16464 16392 16406 16517 16376
1982 15433 16000 15464 15505 15410 15433 15433 15411
1983 15497 16000 15172 15411 15498 15395 15447 15429
1984 15145 15500 15172 15411 15192 15160 15371 15293
1985 15163 16000 15467 15344 15567 15540 15470 15327
1986 15984 16000 15467 16018 15567 15540 15470 15765
1987 16859 16000 16831 16816 16744 16827 16810 16827
1988 18150 17500 18055 18060 17618 17559 18156 18036
1989 18970 19000 18998 19014 19036 19060 18973 19029
1990 19328 19000 19300 19340 19574 19167 19297 19332
1991 19337 19500 19149 19340 19146 19167 19059 19332
1992 18876 19149 19014 19014 19146 18878 19059 19082
1993 N/A 18832
MSE 226611 35324 22965 65689.7 15139
RMSE 256.3 237.7 216.1 123.04
TABLE 8: A comparison of the forecasted enrollments under various high-order FTS models with seven intervals
Order C02[4] CC06b[11] HPSO [14] AFPSO [16] HMV-FTS[22] Our model
2 N/A N/A N/A N/A 22722 16356
3 86694 31123 31644 31189 N/A 8168
4 89376 32009 23271 20155 N/A 6853
5 94539 24948 23534 20366 N/A 6767
6 98215 26980 23671 22276 N/A 6785
7 104056 26969 20651 18482 N/A 3951
8 102179 22387 17106 14778 N/A 3781
9 102789 18734 17971 15251 N/A 6459
CC06b [11], HPSO [14], AFPSO [16] models and the 4.1.2. Experimental results in the testing phase
C02 model [4] are used to compare with the proposed To verify the forecasting accuracy for future en-
model. A comparison of the forecasted results is listed rollments, the historical enrollments are separated two
in 8 where the number of intervals is seven for all parts for independent testing. The first part is used
Nghiem Van Tinh et al.: IMPROVING THE FORECASTED ACCURACY OF MODEL BASED ON FUZZY TIME SERIES AND K-MEANS CLUSTERING 53
Date Actual C96 [3] H01[10] L06 [24] L08 [25] HPSO [14] MTPSO Our model
data [26]
8/3/1998 7552
8/4/1998 7560 7450 7450
8/5/1998 7487 7450 7450
8/6/1998 7462 7500 7500 7450
8/7/1998 7515 7500 7500 7550
8/10/1998 7365 7450 7450 7350
8/11/1998 7360 7300 7300 7350
8/12/1998 7330 7300 7300 7350 7329 7289.56 7325.28 7326.69
8/13/1998 7291 7300 7300 7250 7289.5 7320.77 7287.48 7291.19
———– —– —– —– —– —— —— ——- ——
9/29/1998 6806 6850 6850 6850 6796 6800.07 6781.01 6811.38
9/30/1998 6787 6850 6750 6750 6796 7289.56 6781.01 6784.88
10/1/1998 N/A 6811.01
MSE 9668.94 5437.58 1364.56 105.02 103.61 92.17 50.2
TABLE 11: A comparison of the MSE of the proposed model with that of L08 and HPSO model for the training phase based on high
order FLRGs.
Models 3rd- order 4th- order 5th- order 6th- order 7th- order 8th- order
L08 208.79 142.26 143.31 147.14 105.02 124.48
HPSO 152.47 148.14 112.24 122.68 103.61 108.37
Our model 70 59.4 57.4 52.2 50.2 57.6
[28] Wei Lu, XueyanChen, WitoldPedrycz, XiaodongLiua, Jian- Nghiem Van Tinh received the B.S. de-
huaYang. Using interval information granules to improve fore- gree in applied mathematics and informa-
casting in fuzzy time series, International Journal of Approxi- tion from HaNoi University of Science and
mate Reasoning 57, 118, 2015. Technology (HUST), VietNam, in 2002. the
[29] S.M. Chen , P.Y. Kao , TAIEX forecasting based on fuzzy time M.S. degree in Information of Technology
series, particle swarm optimization techniques and support from ThaiNguyen University of Information
vector machines , Inf. Sci. 247, 6271, 2013. and Communication Technology, in 2007.
[30] S.M. Chen , Bui Dang Ha Phuong, Fuzzy time series fore- He is currently a Ph.D. student at Institute of
casting based on optimal partitions of intervals and optimal Information Technology, Vietnam Academy
weighting vectors, Knowledge-Based Systems .118, 204 216, of Science and Technology, Hanoi, Viet-
2017 nam and working as a Lecturer in the Thai
[31] Hoang Tung, Nguyen Dinh Thuan, Vu Minh Loc. The parti- Nguyen University of Technology. His current research interests
tioning method based on hedge algebras for fuzzy time series include clustering, optimal technique, software engineering, time
forecasting, Journal of Science and Technology 54 (5), 571-583, series analysis, fuzzy time series and forecasting. He has published
2016. several papers on these topics.
[32] Vu Minh Loc, Nghia Huynh Pham Thanh. Context-aware
approach to improve result of forecasting enrollment in fuzzy
time series, International Journal of Emerging Technologies in
Engineering Research (IJETER), Volume 5, Issue 2, 28-33, 2017
[33] .B. MacQueen, Some methods for classification and analy- Nguyen Cong Dieu received the Ph.D.
sis of multivariate observations, in: Proceedings of the Fifth degree in applied mathematics from Insti-
Symposium on Mathematical Statistics and Probability, vol. 1, tute of Information Technology (IOIT), Viet-
University of California Press, Berkeley, CA, pp. 281-297, 1967. namese Academy of Science and Technol-
ogy (1995). He is currently a lecturer at the
Faculty of Information Technology, Thang
Long University and a research associate
of the institute of Information Technology.
His research interests include Data mining,
Computational Intelligence method in data
analysis, fuzzy time series forecasting.