Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

46 JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 3, NO.

2, DECEMBER 2017

Improving the Forecasted Accuracy of Model


Based on Fuzzy Time Series and K-Means
Clustering
Nghiem Van Tinh and Nguyen Cong Dieu

Abstract—There are many approaches to improve the forecasted accuracy of model based on fuzzy time series such as:
determining the optimal interval length, establishing fuzzy logic relationship groups, similarity measures, wherein, the length of
intervals is a factor that greatly affects forecasting results in fuzzy time series model. In this paper, a new forecasting model
based on combining the fuzzy time series (FTS) and K-mean clustering algorithm with three computational methods, K-means
clustering technique, the time - variant fuzzy logical relationship groups and defuzzification forecasting rules, is presented.
Firstly, we apply the K-mean clustering algorithm to divide the historical data into clusters and tune them into intervals with
proper lengths. Then, based on the new intervals obtained, the proposed method is used to fuzzify all the historical data and
create the time -variant fuzzy logical relationship groups based on the new concept of time variant fuzzy logical relationship
group. Finally, Calculate the forecasted output value by the improved defuzzification technique in the stage of defuzzification.
To evaluate performance of the proposed model, two numerical data sets are utilized to illustrate the proposed method and
compare the forecasting accuracy with existing methods. The results show that the proposed model gets a higher average
forecasting accuracy rate to forecast the Taiwan futures exchange (TAIFEX) and enrollments of the University of Alabama than
the existing methods based on the first order and high-order fuzzy time series.

Index Terms—Forecasting, fuzzy time series, fuzzy logical relationship groups, K-mean clustering, defuzzification rules
enrollments, TAIFEX.

1. Introduction

M ANY forecasting models based on the concept of


fuzzy time series have been proposed to resolve
the various domain problems in the past decades, such
erations to forecast the enrollments of the University
of Alabama. However, the main drawback of these
methods are which required a lot of computation time
as the enrollments forecasting [1]–[4], crop forecast [5], when a fuzzy logical relationship matrix is large. Then,
[6], stock markets [7], [8], temperature prediction [8], Chen [3] used simplified arithmetic operations avoiding
[9]. There is the matter of fact that the traditional the complicated maxmin operations, and their method
forecasting models such as regression analysis, moving produced better results. Afterward, fuzzy time series
average, exponential moving average, autoregressive has been widely studied to improve the accuracy of
moving average and ARIMA model cannot deal with forecasting in many applications. Huarng [10] presented
the forecasting problems in which the historical data a new method for forecasting the enrollments of the
are represented by linguistic values. These approaches University of Alabama and the TAIFEX by adding a
require having the linearity assumption and needing heuristic function to get better forecasting results. Chen
a large amount of historical data. Instead of these, also extended his previous work [3] to present several
in fuzzy time series forecasting models, there is not forecast models based on the high-order FTS to deal
only a limitation for the number of observations or the with the enrollments forecasting problem [4], [11]. Yu
length of historical data but also there is no need for have shown models of refinement relation [12] and
the linearity assumption. Therefore, Song and Chissom weighting scheme [7] for improving forecasting accu-
proposed the time-invariant FTS model [1] and the racy. Both the stock index and enrollments are used
time-variant FTS model [2] which use the maxmin op- as the targets in the empirical analysis. Huang [13]
shown that different lengths of intervals may affect the
accuracy of forecast. He modified previous method [10]
Nghiem Van Tinh is with Thai Nguyen University of Technol-
ogy, Thai Nguyen University, Thai Nguyen, Vietnam (email: nghiem- by using the ratio-based length to get better forecasting
[email protected]). accuracy. Recently, in [14]–[16] presented a new hybrid
Nguyen Cong Dieu is with Thang Long University, HaNoi, Viet- forecasting model which combined particle swarm opti-
nam (email: ncdieu@yahoo). mization with fuzzy time series to find proper length of
each interval by which adjust interval lengths. In addi-
Manuscript received August 2, 2017; revised October 4, 2017 and
November 7, 2017; accepted December 11, 2017.
tion, Dieu N.C et al. [17] introduced the concept of time-
Digital Object Identifier 10.31130/jst.2017.47 variant fuzzy logical relationship group and combined it
ISSN 1859-1531
Nghiem Van Tinh et al.: IMPROVING THE FORECASTED ACCURACY OF MODEL BASED ON FUZZY TIME SERIES AND K-MEANS CLUSTERING 47

with PSO algorithm for forecasting in fuzzy time series using clustering algorithms [20], [21] and [22] in the
model. N. Van Tinh and N.C Dieu [18] extended our way where the fuzzy logical relationship groups and
previous work [17] to a high-order fuzzy time series forecasted rules are created. In case study, the proposed
model to forecast stock market indices of TAIFEX. Some model was applied to forecast the enrollments of the
other techniques for determining best intervals and University of Alabama and the TAIFEX. The experi-
interval lengths based on clustering techniques such mental results showed that the proposed method gets
as; automatic clustering techniques are found [19], the a higher average forecasting accuracy compared to the
K-means clustering combining the FTS in [20] and the existing methods. In addition, the empirical results also
fuzzy c-means clustering in [21]. Another way, a high- showed that the high-order FTS model outperformed
order algorithm for Multi-Variable FTS [22] based on the first-order FTS model with a lower forecast error.
fuzzy clustering is presented to deal various forecasting
problems such as: enrollment forecasting, Gas forecast-
2. Fuzzy Time Series and Algorithms
ing, Rice produce prediction. As already mentioned in
researches above, the forecasting performance of the 2.1. Fuzzy Time Series
fuzzy time series model is influenced by interval length, In 1993, Song and Chissom proposed the
and determination of the appropriate interval partition- definitions of fuzzy time series, where the
ing method is supposedly a challenging task as shown values of fuzzy time series are represented
in literature [13] and some others. In spite of significant by fuzzy sets.Let U = {u1 , u2 , ..., un } be an
achievements in using the length of each interval, this universal set; a fuzzy set A of U is defined as
problem still raises researchers attention. Up to now, A = {fA (u1 )/u1 + fA (u2 )/u2 + ... + fA (un )/un },
there are still rather many ways to determine the lengths where fA is a membership function of a given set A:
of intervals in the universe of discourse. For Example, In U → [0, 1], fA (ui ) indicates the grade of membership
[27], Lizhu Wang et al. had taken the temporal informa- of ui in the fuzzy set A. fA (ui ) ∈ [0, 1], and 1 ≤ i ≤ n.
tion into account to partition the universe of discourse General definitions of fuzzy time series are given
into intervals with unequal length, Lu et al. in [28] as follows:
used information granules to partition the universe of Definition 1: Fuzzy time series
discourse into intervals by continually adjusting width Let Y (t)(t = .., 0, 1, 2..), a subset of R, be the universe
of these intervals and achieved forecasting accuracy of discourse on which fuzzy sets fi (t)(i = 1, 2) are
with different interval lengths. Next, combining the defined and if F(t) be a collection of f1 (t), f2 (t),· · ·, then
support vector machines with the PSO techniques to F(t) is called a fuzzy time series on Y (t)(t..., 0, 1, 2...).
determine optimal intervals in the universe of discourse Definition 2: Fuzzy logical relationship (FLR) [2]
and classify the training data set. Following, in [29] If there exists a fuzzy relationship R(t-1,t), such that
Chen and Kao presented a model for forecasting the F (t) = F (t − 1) ∗ R(t − 1, t), where ∗ is an maxmin
TAIEX based on FTS combining the support vector composition operator, then F(t) is said to be caused by
machines for classifying the training data set and the F(t-1). The relationship between F(t) and F(t-1) can be
PSO techniques for determining optimal intervals in the denoted by F (t − 1) → F (t). Let Ai = F(t) and Aj
universe of discourse. Based on the benefit of using PSO = F(t-1), the relationship between F(t) and F(t -1) is
techniques simultaneously, Chen et al. in [30] propose a denoted by fuzzy logical relationship Ai → Aj where
new FTS forecasting model based on optimal partitions Ai and Aj refer to the current state or the left - hand
of intervals in the universe of discourse and optimal side and the next state or the right-hand side of fuzzy
weighting vectors of two-factors second-order fuzzy- relations.
trend logical relationship groups to forecast the TAIEX Definition 3: Fuzzy logical relationship groups
and the NTD/USD exchange rates. Another approach (FLRGs) [3]
trend, in [32] presented a novel partitioning interval Fuzzy logical relationships, which have the same fuzzy
method based on hedge algebras. According to this set located in the left-hand side of the fuzzy logical
method, the number of intervals are equal to the number relationships, can be grouped into a FLRG. Suppose
of linguistic terms used to qualitatively describe the his- there are exists fuzzy logical relationships as follows:
torical values of fuzzy time series. In this paper, a new Ai → Ak1 ; Ai → Ak2 ;...; Ai → Akm ; they can be
hybrid forecasting model based on combining the K- grouped into an FLRG as : Ai → Ak1 , Ak2 , · · · , Akm .
mean clustering algorithm for partitioning the universe The repeated FLRs in the FLRGs are discarded by Chen
of discourse and the time variant fuzzy logical relation- [3], [14] and counted only once, but according to Yu
ship groups (FLRGs) is presented. Although the idea model [7], this repeated FLRs can be accepted.
of using K-means clustering algorithm for partitioning Definition 4: The - order fuzzy logical relationships [4]
historical dataset into intervals of different lengths is Let F(t)be a fuzzy time series. If F(t) is caused by
not novel as can be seen in [20], combining with the F(t-1), F(t-2),, F(t-+1) F(t-) then this fuzzy relationship is
time variant FLRGs in the determining of fuzzy logical represented by F (t − ), , F (t − 2), F (t − 1) → F (t) and
relationships stage and novel forecasted rules in the is called an - order fuzzy time series.
defuzzification stage can help to improve the forecasting Definition 5: Time-variant fuzzy relationship
result significantly. From this view point, the proposed groups [17]
method is different from the approaches which also The relationship between F(t) and F(t-1) is determined
48 JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 3, NO. 2, DECEMBER 2017

by F (t − 1) → F (t). Let F(t)= Ai (t) and F(t- [xmin , xmax ]. The outline of proposed method is pre-
1)= Aj (t − 1), we will have the relationship sented in Fig. 1, which consists of two stages; the first
Aj (t − 1) → Ai (t). At the time t, we have the stage is to partition the historical data into intervals
following fuzzy logical relationships:Aj (t − 1) → Ai (t); based on algorithm 1 and the second stage is to build
Aj (t1 − 1) → Ai1 (t1);...; Aj (tp − 1) → Aip (tp) with the forecasted model to perform prediction output. The
t1,t2,..,tp ≤ t. It is noted that Ai(t1) and Ai(t2) have two stages of forecasting model are described as follows:
the same linguistic value as Ai, but appear at different
times t1 and t2, respectively. It means that if the
fuzzy relations occurred before Aj (t − 1) → Ai (t),
we can group the fuzzy logic relationship to be
Aj (t − 1) → Ai1 (t1), Ai2 (t2), Aip (tp), Ai (t). It is called
first order time-variant fuzzy logical relationship
group.
Fig. 1: The flow chart of the proposed forecasting model
2.2. Algorithm 1: K-Means clustering algorithm
K-means clustering is one of the simplest unsuper-
vised learning algorithms introduced by MacQueen [33] 3. Forecasting model based on K -means clustering
that can solve the clustering problem [20]. K-means and FTS
clustering method groups the collected data into clus- In this section, a novel method based on combining
ters based on their closeness to each other according to the FTS and K-means clustering algorithm for forecast-
Euclidean distance. The result depends on the number ing the enrolments of University of Alabama, is pre-
of cluster. The algorithm is consists of the following sented. Firstly, K-means clustering algorithm is applied
major steps Step 1: Choose k centroids {z1 , z2 , · · · zk } to classify the collected data of enrolments into clusters
Step 2: Assign each object x to the clusters Ci : x ∈ Ci and adjusted these clusters into contiguous intervals
if d(x, zi ) ≤ d(x, zj ), j 6= i P for generating intervals from the enrolment data in
Step 3: Update {zi } to minimize Ji = x∈Ci |x − Subsection 3.1. Then, based on the defined intervals,
zi |2 , i = 1..k we fuzzify all historical enrolments data into fuzzy sets
1 P and establish time - variant FLRGs. Finally, based on
zi = (x) = mi
N ci the obtained time - variant FLRGs, we calculate the
Step 4: Reassign the objects using the new centroids
Step 5: Repeat Steps 2, 3 and 4 until the centroids no forecasting results using the proposed defuzzification
longer move. rules, shown in Subsection 3.2. To verify the effective-
ness of the proposed model, all historical enrollments
[3] (the enrollment data at the University of Alabama
2.3. Algorithm 2: The time variant FLRGs from 1971s to 1992s) are used to illustrate the first - order
Assume there are fuzzy time series F(t), t =1, 2 ,, fuzzy time series forecasting process.
q ,wherein it is presented by fuzzy sets as follows:
Ai1 , Ai2 , , Aiq . Based on the Definition 5 of the time - 3.1. The K-Mean algorithm for generating intervals from
variant FLRGs, an algorithm is proposed as follows: The historical dataset
λ - order time variant fuzzy logical relationship groups
algorithm The algorithm composed of two steps is introduced
1: initialize the λ-order time variant FLRGs t= λ ; step-by-step with the same dataset [3]:
F(1),F(2),..,F(λ -1) → F(λ) or Aj2 , , Ajλ → Ak1 (λ) Step 1: Apply the K-means clustering algorithm to
2: for t: = λ do q do partition the historical time series data into c clusters
for h: = λ down to 1 do and sort the data in clusters in an ascending sequence.
Create all λ- order FLRs Aj2 (t−λ), · · · , Ajλ (t−1) → In this paper, suppose c =14 clusters, the results of
Ak1 (t) clusters are as follows:
end for {13055}, {13563, 13867}, {14696, 15145, 15163},
3: for v: = 1 to t-1 do {15311}, {15433, 15460, 15497},{15603},
for h = 1 do v do {15861, 15984}, {16388}, {16807}, {16859}, {16919},
if there is fuzzy logical relation Aj2 · · · , Ajm → {18150}, {18970, 18876}, {19328, 19337}
Ak2 (h) at the same left - hand side, then add Ak2 into Furthermore, the number of clusters is selected by an
FLRGs as follows: Aj2 , · · · , Ajλ → AkAk1 , Ak2 any way that do not exceed the total amount of data in
end for the time series, such as c is 7, 8,9 11, ..., 22.
end for Step 2: Adjust the clusters into intervals In this step,
we use automatic clustering techniques [19] to generate
cluster center (Center k) from clusters and Adjust the
2.4. The proposed forecasting method based on com- clusters into intervals according to the following rules:
bining K-Means clustering algorithm and FTS
n
Suppose that X = {x1 , x2 , · · · , xn } is a historical 1X
Centerk = di (1)
time series data on the universe of discourse U = n i=1
Nghiem Van Tinh et al.: IMPROVING THE FORECASTED ACCURACY OF MODEL BASED ON FUZZY TIME SERIES AND K-MEANS CLUSTERING 49

where di is a datum in clusterk , n denotes the number Step 3: Define the fuzzy setsAi for observations
of data in clusterk and 1 ≤ k ≤ c. Suppose that (historical data).
Centerk and Centerk+1 are adjacent cluster centers, Each interval in Step 1 represents a linguistic variable
then the upper bound Cluster U Bk of clusterk and of enrollments. For 14 intervals, there are 14 linguistic
the lower bound cluster LBk+1 of clusterk+1 can be variables. Each linguistic variable represents a fuzzy set
calculated as follows: Ai (1 ≤ i ≤ 14). All possible values of these linguistic
variables are A1 {very very very very f ew},
Centerk + Center(k+1) A2 {very very very f ew},
Cluster U Bk = (2)
2 A3 {very very f ew} , A4 {very f ew} , A5 {f ew},
Cluster LBk+1 = Cluster U Bk (3) A6 {moderate} , A7 {many},
A8 {many many} , A9 {very many} , A10 {too many},
where k =1,.., c - 1. Because there is no previous cluster A11 {too many many} , A12 {too many many many},
before the first cluster and there is no next cluster after A13 {too many many many many} and
the last cluster, the lower bound Cluster LB1 of the A14 {too many many many many many} which
first cluster and the upper bound Cluster UBc of the can represent different intervals in the universe of
last cluster can be calculated as follows: discourse, and its definition is described according to
Cluster LB1 = Centerr1 − (Center1 − Cluster U B1 ) (7)
ai1 ai2 ai3 ai14
(4) Ai = + + + ... + (7)
u1 u2 u3 u14
Cluster U Bc = Centerc + (Centerc − Cluster LBc )
(5) Here the symbol + denotes the set union operator and
Then, assign each cluster Clusterk form an inter- the symbol / denotes the membership of uj (1 ≤ j ≤ 14)
val intervalk , which means that the upper bound which belongs to Ai , respectively. For simplicity, the
Cluster U Bk and the lower bound Cluster LBk the different membership values aij of fuzzy set Ai of
cluster clusterk are also the upper bound interval U Bk {0, 0.5 and 1} are selected to indicate the grade of
and the lower bound interval LBk of the interval membership of uj in the fuzzy set Ai . According to (7),
intervalk , respectively. Calculate the middle value a fuzzy set contains 14 intervals. Contrarily, an interval
M id valuek of the interval intervalk as follows: belongs to all fuzzy sets with different membership
degrees. For example, u1 belongs to A1 and A2 with
interval LBk + interval U Bk membership degrees of 1 and 0.5 respectively, and other
M id valuek = (6)
2 fuzzy sets with membership degree is 0.
where iinterval LBk and interval U Bk are the lower Step 4: Fuzzy all historical enrollments data In order
bound and the upper bound of the interval intervalk , to fuzzify all historical data, its necessary to assign a
respectively, with k = 1, · · · , c. Based on the rules of corresponding linguistic value to each interval first.
this step, we obtain 14 intervals corresponding to the The simplest way is to assign the linguistic value with
clusters in step 1 and calculate the middle value of the respect to the corresponding fuzzy set that each interval
intervals are listed in Table 1 belongs to with the highest membership degree. For
example, the historical enrollments of year 1972 is
TABLE 1: The intervals and their midpoints obtained by K-means
clustering technique 13563, and it belongs to interval u2 because 13563
is within u2 = [13385, 14358]. So, we then assign the
No Interval MidPoint linguistic value or the fuzzy set A2 corresponding to
1 [12725, 13385) 13055
2 [13385, 14358) 13871.5
interval u2 . In the same way, the results of fuzzification
3 [14358, 15156) 14757 on enrollments of the University of Alabama are listed
4 [15156, 15387) 15271.5 are listed in Table 2.
5 [15387, 15533) 15460
6 [15533, 15762.5) 15647.75 TABLE 2: The intervals and their midpoints obtained by K-means
7 [15762.5, 16155) 15958.75 clustering technique
8 [16155, 16597.5) 16376.25
9 [16597.5, 16833) 16715.25 Year Actual Fuzzy set
10 [16833, 16889) 16861 1971 13055 A1
11 [16889, 17534.5) 17211.75 1972 13563 A2
12 [17534.5, 18536.5) 18035.5 1973 13867 A2
13 [18536.5, 19127.5) 18832 1974 14696 A3
14 [19127.5, 19536.5) 19332 1975 15460 A5
1976 15311 A4
1977 15603 A6
1978 15861 A7
1979 16807 A9
3.2. Forecasting model based on the first - order time - 1980 16919 A11
variant FLRGs 1981 16388 A8
In this section, we present a hybrid method for —- —- —-
1990 19328 A14
forecasting enrollments based on the K-mean clustering 1991 19337 A14
algorithm and time - variant fuzzy logical relationship 1992 18876 A13
groups. The proposed method is now presented from
step 3 to step 7 as follows: Step 5: Create all λ order fuzzy logical relationships
50 JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 3, NO. 2, DECEMBER 2017

. Based on Definition 2 and 3, to establish a phase.


λ-order fuzzy logical relationship, we should
find out any relationship which has the type TABLE 4: The completed all first-order time - variant fuzzy logical
relationship groups
F (t − λ), F (t − λ + 1), ..., F (t − 1) → F (t), where
F(t-λ),F(t-λ+1),...,F(t-1) and F(t) are called the current No Time variant FLRGs Relations between F(t)
state and the next state, respectively. Then a λ- order 1 A1 → A2 F(1971)→ F(1972)
2 A2 → A2 F(1972)→ F(1973)
fuzzy logical relationship in the training phase is got 3 A2 → A2, A3 F(1972)→ F(1973),F(1974)
by replacing the corresponding linguistic values. For 4 A3 → A5 F(1974) → F(1975)
example, supposed λ=1 from Table 2, a fuzzy logic 5 A5 → A4 F(1975) → F(1976)
relationship A1 → A2 is got as F (1971) → (1972). 6 A4 → A6 F(1976) → F(1977)
7 A6 → A7 F(1977) → F(1978)
So on, all first-order fuzzy logical relationships from 8 A7 → A9 F(1978) → F(1979)
year 1972 to 1992 are shown in column 3 of Table 9 A9 → A11 F(1979) → F(1980)
3, where there are 22 fuzzy logical relationships; the 10 A11→ A8 F(1980) → F(1981)
11 A8 → A5 F(1981) → F(1982)
first 21 relationships are called the trained patterns,
12 A5 → A4, A5 F(1982) → F(1976),F(1983)
and the last one is called the untrained pattern (in the – ———– ———————-
testing phase). For the untrained pattern, relation 22 21 A14 → A14 F(1990) → F(1991)
has the fuzzy relation A13 → # as it is created by the 22 A14 → A14, A13 F(1991) → F(1991),F(1992)
23 A13 → # F(1992) → F(1993)
relation F (1992) → F (1993), since the linguistic value
of F(1993) is unknown within the historical data, and
this unknown next state is denoted by the symbol #
Step 7: Calculate and defuzzify the forecasting output
TABLE 3: The first-order fuzzy logical relationships values for all time variant FLRGs
In this step, to obtain the forecasted results, a new
No FLRs F(t) Fuzzy relations
defuzzification technique is presented to calculate the
1 F(1971)→ F(1972) A1 → A2
2 F(1972) → F(1973) A2 → A2 forecasted values for all time variant FLRGs in training
3 F(1973) → F(1974) A2 → A3 phase. Then we also use defuzzification rule is proposed
4 F(1974) → F(1975) A3 → A5 in [14] for the time variant FLRGs in testing phase.
5 F(1975) → F(1976) A5 → A4
6 F(1976) → F(1977) A4 → A6 For the training phase , we estimate forecast values
7 F(1977) → F(1978) A6 → A7 for all time variant FLRGs based on fuzzy sets on
8 F(1978) → F(1979) A7 → A9 the right-hand within the same group. For each group
9 F(1979) → F(1980) A9 → A11 in column 2 of Table 4, we divide each corresponding
10 F(1980) → F(1981) A11 → A8
11 F(1981) → F(1982) A8 → A5 interval of each next state into p sub-intervals with
12 F(1982) → F(1983) A5 → A5 equal length, and calculate a forecasted value for each
13 F(1983) → F(1984) A5 → A3 group according to (8).
14 F(1984) → F(1985) A3 → A4
15 F(1985) → F(1986) A4 → A7 n
16 F(1986) → F(1987) A7 → A10
1X
F orrcastedoutput = submkj (8)
17 F(1987) → F(1988) A10 → A12 n i=1
18 F(1988) → F(1989) A12 → A13
19 F(1989) → F(1990) A13 → A14 where, (1 ≤ j ≤ n, 1 ≤ k ≤ p )
20 F(1990) → F(1991) A14 → A14
21 F(1991) → F(1992) A14 → A13
- n is the total number of next states or the total
22 F(1992) → F(1993) A13 → # number of fuzzy sets on the right-hand side within the
same group.
Step 6: Establish all time - variant fuzzy logical - submkj is the midpoint of one of p sub-intervals
relationship groups In this step, a method is different (means the midpoint of j-th sub-interval) corresponding
from the approach in [3] and [14] in the way where to j-th fuzzy set on the right-hand side where; the
the fuzzy logical relationship groups are created. In highest level of Akj takes place in this interval.
previous approach, all the fuzzy logical relationships For instance, in column 2 of Table 4, we can see that
having the same fuzzy set on the left-hand side or the there is a first-order time-variant FLRGs A1 → A2
same current state can be grouped into a same fuzzy in Group 1 which has only one fuzzy set on the
relationship group. But, according to the Definition 5 right - hand side as A2 ; where the highest member-
and algorithm 2, the appearance history of the fuzzy ship level of A2 belongs to interval u2 =[13385,14358).
sets on the right-hand side of fuzzy logical relationships In this paper, we divide the interval u2 into
is need to more consider. That is, only the fuzzy set four sub-intervals which are u2.1 =[13385,13628.25),
on the right - hand side appearing before forecasting u2.2 =[13628.25,13871.5), u2.3 =[13871.5,14114.75), u2.4 =
time which has the same fuzzy set on the left-hand side [14114.75,14358). In Table 4, the first - order timevariant
of fuzzy logical relationship is grouped into a fuzzy FLRG group A1 → A2 is got as F(1971)→F(1972) ;
logical relationship group, called time variant FLRG. where the historical data of year 1972 is 13563 and it
From this viewpoint and based on Table 3, we can is within sub-interval u2.1 =[13385,13628.25) and then
establish all time - variant fuzzy logical relationship the midpoint subm2.1 of sub-interval u2.1 is 13506.63.
groups are shown in column 2 of Table 4 which consists Therefore, the forecasted value for Group 1 according to
of 21 groups in training phase and one group for testing (8) is 13506.63. The forecasted value of remaining first
Nghiem Van Tinh et al.: IMPROVING THE FORECASTED ACCURACY OF MODEL BASED ON FUZZY TIME SERIES AND K-MEANS CLUSTERING 51

order time variant FLRGs are calculated in a similar 4. Experimental Results


manner For the testing phase, we calculate a forecasted In this paper, the proposed method is utilized to
value based on master voting (MV) scheme [14] to deal forecast the enrolments of University of Alabama with
with the untrained pattern and shown as follows: the whole historical data [3], the period from 1971 to
1992 and handles other forecasting problems, such as
(mt1 × wh ) + mt2 + · + mtλ
F orrcastedf or# = (9) the empirical data for the TAIFEX [25] from 8/3/1998
wh + λ − 1 to 9/30/1998, used to perform comparative study in the
Where the symbol wh means the highest votes prede- training phase.
fined by user, λ is the order of the fuzzy relationship,
mti denote the midpoints of the corresponding inter- 4.1. Experimental results for forecasting enrollments
vals. Based on the forecast rules are presented in (8) and Actual enrollments of the University of Alabama [3]
(9) , we complete forecasted results for all first-order are used to perform comparative study in the training
time - variant FLRGs are listed in Table 5. and testing phases. In order to verify forecasting effec-
tiveness, the proposed model is compared with those
TABLE 5: The complete forecasted values for all first-order time -
of corresponding models for various orders and differ-
variant fuzzy logical relationship groups (FLRGs)
ent intervals. The forecasted accuracy of the proposed
No Time variant FLRGs Value method is estimated according to (10).
1 A1 → A2 1354
2 A2 → A2 13872 4.1.1. Experimental results from the training phase
3 A2 → A2, A3 14314
4 A3 → A5 15460
In order to verify the forecasting effectiveness of
.. ............. ........ the proposed model for the first order FLRGs under
16 A7 → A9, A10 16827 different number of intervals, six forecasting models are
17 A10 → A12 18036 examined and compared. There are the H01 model [10],
18 A12 → A13 19029
19 A13 → A14 19332 CC06a model [23], HPSO model [14], model [28] and
20 A14 → A14 19332 the models in [31], [32]. A comparison of the forecasting
21 A14 → A14, A13 19082 results among these models is shown in Table 7. It is
22 A13→ # 18832
obvious that the proposed model gets the smallest MSE
value of 15139 or RMSE value is 123.04 among all the
Based on Table 5 and the data in Table 2, we complete compared models with different number of intervals.
forecasted results for enrollments from 1971 to 1992 The major difference between the CC06a model, HPSO
based on first - order fuzzy time series model with 14 model and our models is that at the defuzzification
intervals are listed in Table 6. stage and optimization method is used. Two models in
CC06a [23], HPSO [14] use the genetic algorithm and
TABLE 6: The complete forecasted output values based on the
firstorder FTS under number of intervals of 14. the particle swarm optimization algorithm to get the
appropriate intervals, respectively, while the proposed
Year Actual Fuzzy set Forecasted Forecasted- model performs the K- mean algorithm to achieve the
data value actual
1971 13055 A1 - -
best interval lengths. For the models in [31], [32], They
1972 13563 A2 13547 -16 use the various techniques for partitioning intervals
1973 13867 A2 13872 5 on the universe of discourse based on hedge algebras.
1974 14696 A3 14314 -382 Namely, these forecasted models are partitioned into 17
—- —- — —- —
1991 19337 A14 19332 -5 intervals based on the first order FTS, while our model
1992 18876 A13 19082 206 apply the K- mean algorithm for optimal partitioning
1993 N/A # 18832 - intervals and use the new defuzzification rules to get
forecasting results. as shown in 7, the proposed model
To evaluate the performance of the proposed model, has a RMSE value is 123.04 which is the lowest among
the mean square error (MSE) or root mean square er- three forecasting models compared by evaluation crite-
ror (RMSE) is employed as an evaluation criterion to rion in (11). That is, the proposed model is more precise
represent the forecasted accuracy. The MSE and RMSE than the three forecasting models recently.
values are calculated as follows: The trend in forecasting of enrollments by first-order
n fuzzy time series model in comparison to the actual
1X enrollments and with other existing models can be visu-
M SE = (Fi − Ri )2 (10)
n i=λ alized in Fig. 2 and Fig. 3. From Fig. 2, it can be seen that
the forecasted value of proposed model is close to the
v
u n actual enrolments each year, from 1972s to 1992s than
u1 X
RM SE = t (Fi − Ri )2 (11) the compared models. Fig. 3 shows that the forecasting
n i=λ accuracy of the proposed model is more precise than
four recent models for the first-order FTS with different
where, Ri denotes actual data at year i, Fi is forecasted number of intervals.
value at year i, n is number of the forecasted data, λ is To verify the forecasting effectiveness for high-order
order of the fuzzy logical relationships. fuzzy time series, four existing forecasting models, the
52 JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 3, NO. 2, DECEMBER 2017
TABLE 7: A comparison of the forecasted results for the first-order FLRGs with 14 intervals

Year Actual data H01 CC06a HPSO Model [28] Model [31] Model [32] Our model
1971 13055
1972 13563 14000 13714 13555 13678 13544 13582 13547
1973 13867 14000 13714 13994 13678 13906 13582 13872
1974 14696 14000 14880 14711 14602 14683 14457 14314
1975 15460 15500 15467 15344 15498 15443 15443 15460
1976 15311 15500 15172 15411 15192 15395 15447 15348
1977 15603 16000 15467 15411 15641 15620 15447 15571
1978 15861 16000 15861 15411 15827 15919 15371 15828
1979 16807 16000 16831 16816 16744 16827 16752 16794
1980 16919 17500 17106 17140 17618 17559 17031 16997
1981 16388 16000 16380 16464 16392 16406 16517 16376
1982 15433 16000 15464 15505 15410 15433 15433 15411
1983 15497 16000 15172 15411 15498 15395 15447 15429
1984 15145 15500 15172 15411 15192 15160 15371 15293
1985 15163 16000 15467 15344 15567 15540 15470 15327
1986 15984 16000 15467 16018 15567 15540 15470 15765
1987 16859 16000 16831 16816 16744 16827 16810 16827
1988 18150 17500 18055 18060 17618 17559 18156 18036
1989 18970 19000 18998 19014 19036 19060 18973 19029
1990 19328 19000 19300 19340 19574 19167 19297 19332
1991 19337 19500 19149 19340 19146 19167 19059 19332
1992 18876 19149 19014 19014 19146 18878 19059 19082
1993 N/A 18832
MSE 226611 35324 22965 65689.7 15139
RMSE 256.3 237.7 216.1 123.04

TABLE 8: A comparison of the forecasted enrollments under various high-order FTS models with seven intervals

Order C02[4] CC06b[11] HPSO [14] AFPSO [16] HMV-FTS[22] Our model
2 N/A N/A N/A N/A 22722 16356
3 86694 31123 31644 31189 N/A 8168
4 89376 32009 23271 20155 N/A 6853
5 94539 24948 23534 20366 N/A 6767
6 98215 26980 23671 22276 N/A 6785
7 104056 26969 20651 18482 N/A 3951
8 102179 22387 17106 14778 N/A 3781
9 102789 18734 17971 15251 N/A 6459

forecasting models. From 8, it is clear that the proposed


model is more precise than the four forecast models at
all, since the best and the average fitted accuracies are all
the best among the five models. Practically, at the same
intervals, the proposed method obtains the lowest MSE
values which are 8168, 6853, 6767, 6785, 3951, 3781, 6459
for 3-order, 4-order, 5-order, 6-order, 7-order, 8-order
and 9-order fuzzy logical relationships, respectively. In
addition, performance of proposed model is also com-
pared with HMV-FTS algorithm [22] using enrollments
dataset based on the second - order FTS with at the
Fig. 2: The curves of the actual data and the H01, CC06a, HPSO same interval of 7. Although the proposed model and
models and our proposed model for forecasting enrollments of
the HMV-FTS model both use the clustering algorithm
University of Alabama
to attain the best interval lengths, but the proposed
model gets lower MSE value of 16356 for the second
order fuzzy logical relationships, as the major difference
between the HMV-FTS model and the proposed model
is in the defuzzification rules and the establishment of
fuzzy logical relationship groups used. The proposed
model also gets the smallest MSE value of 3781 for the
Fig. 3: A comparison of the accuracy of the proposed model with
the recent forecasted models by the RMSE value
8th-order fuzzy logical relationships among all orders
of forecasting model.

CC06b [11], HPSO [14], AFPSO [16] models and the 4.1.2. Experimental results in the testing phase
C02 model [4] are used to compare with the proposed To verify the forecasting accuracy for future en-
model. A comparison of the forecasted results is listed rollments, the historical enrollments are separated two
in 8 where the number of intervals is seven for all parts for independent testing. The first part is used
Nghiem Van Tinh et al.: IMPROVING THE FORECASTED ACCURACY OF MODEL BASED ON FUZZY TIME SERIES AND K-MEANS CLUSTERING 53

as training data set and the second part is used as


the testing data set. In this paper, the historical data
of enrollments from year 1971 to 1989 is used as the
training data set and the historical data of enrollments
from year 1990 to 1992 is used as the testing data set.
For example, to forecast a new enrolment of 1990, the
enrollments of 1971-1989 are used as the training data.
Similarly, a new enrolment of 1991 can be forecasted
based on the enrollments under years 1971-1990. After
the training data have been well trained by the proposed
model, future enrollments could be obtained to compare
Fig. 4: A comparison of the MSE values for 16 intervals with
with testing data. Some experimental results of the different high-order FLRGs
forecasting models for the testing phase are listed in
Table 9.
Swarm optimization, K - Means clustering and so on
TABLE 9: A comparison of actual data and forecasted result for 14
intervals in the testing phase
have been used for the determination of interval lengths
in the universe of discourse. In this paper, a hybrid fore-
Forecasted casting model based on fuzzy time series and K-mean
Year Actual value clustering algorithm was presented. Different from the
1st- 2nd- 3rd- 4th- 5th-
order order order order order existing fuzzy time series models, we establish the fuzzy
1990 19328 18560 18560 18502 18563 18455 logical relationship groups based on the concept of
1991 19337 19149 19149 19087 19082 19058 time-variant fuzzy logical relationship group to obtain
1992 18876 18946 18946 18946 19030 19012 more exactly information served in the stage of defuzzi-
fication. By combining K - Means clustering for the
determination of optimal interval lengths with the time-
4.2. Experimental results for the TAIFEX forecasting variant fuzzy logical relationship group, the forecasting
performance of proposed model is improved, signifi-
In this paper, we also apply the proposed method to
cantlyIn addition, by considering of more information
forecast the TAIFEX index with the whole historical
on the right-hand side of fuzzy logical relationships in
data [25] are used . To verify the superiority in the
the same time variant FLRG, the proposed model is still
forecasted accuracy of the proposed model with the
continue improving the forecasting results, clearly. From
high-order FLRGs under numbers of intervals is 16, six
the experimental study on the enrollments forecasting of
FTS models C96 [3], H01b [10], L06 [3], L08 [25], HPSO
the University of Alabama and TAIFEX forecasting, the
[14] and MTPSO model [26] are selected for purposes
results have shown that the proposed model has higher
of comparison. A comparison of the forecasted results
forecasting accuracy than some compared models at
is listed in Table 10 where all forecasting models use
all. Specially, with the high-order fuzzy time series
high-order fuzzy logical relationships under different
from 2nd to 9th for forecasting enrollments and from
number of intervals. In addition to, to demonstrate the
3rd to 8th for TAIFEX prediction, our model is much
effectiveness of the proposed model, two forecasting
more effective compared to the existing models. It also
models based on high- order FTS are selected to be
performs best for fuzzy time series with various orders
compared with proposed model. These two models are
of fuzzy logical relationships in the testing phases. This
proposed by L08 model [25], HPSO model [14], respec-
study has discovered the synergistic effect of K-mean
tively. The forecasted errors by MSE value of all models
clustering algorithm and time - variant fuzzy logical
are listed in Table 11. From Table 11, the experimental
relationship groups in the stage of determining of fuzzy
results show that our proposed model bears all the
logical relationships, and also proposed a new fuzzy
smallest MSE in ten testing times. From these results, it
solving rule in the defuzzification stage to improve the
is obvious that our model significantly outperforms the
forecasting accuracy. The researched results have shown
models proposed by L08 model [25] and HPSO model
the proposed model outperforms models compared for
[14] and obtains the smallest MSE value of 50.2 for
the training with various orders and different interval
the 7th-order FLRGs. From Fig. 4, it can see that the
lengths. These results are very promising for the future
forecasting values of the proposed model is close to the
work on the development of fuzzy time series and K
actual data than the compared models.
-mean clustering algorithm in real-world forecasting
applications.
5. Conclusion
There are many factors affecting the forecasting accu- Acknowledgement
racy of model based on fuzzy time series. In the fuzzi- This work was supported in part by the science coun-
fication stage, determination of interval lengths is a im- cil of Thai Nguyen University of Technology - Thai
portant factor affecting the accuracy of fuzzy time series Nguyen University. The authors also gratefully ac-
model. In recent years, many optimization algorithms knowledge the Editor and anonymous reviewers for
such as Artificial Bee Colony Algorithm (ABC), Particle their valuable comments and constructive suggestions.
54 JOURNAL OF SCIENCE AND TECHNOLOGY: ISSUE ON INFORMATION AND COMMUNICATIONS TECHNOLOGY, VOL. 3, NO. 2, DECEMBER 2017
TABLE 10: A comparison of the forecasted results of the proposed method with the existing models based on high order of the fuzzy
time series under number of intervals is 16

Date Actual C96 [3] H01[10] L06 [24] L08 [25] HPSO [14] MTPSO Our model
data [26]
8/3/1998 7552
8/4/1998 7560 7450 7450
8/5/1998 7487 7450 7450
8/6/1998 7462 7500 7500 7450
8/7/1998 7515 7500 7500 7550
8/10/1998 7365 7450 7450 7350
8/11/1998 7360 7300 7300 7350
8/12/1998 7330 7300 7300 7350 7329 7289.56 7325.28 7326.69
8/13/1998 7291 7300 7300 7250 7289.5 7320.77 7287.48 7291.19
———– —– —– —– —– —— —— ——- ——
9/29/1998 6806 6850 6850 6850 6796 6800.07 6781.01 6811.38
9/30/1998 6787 6850 6750 6750 6796 7289.56 6781.01 6784.88
10/1/1998 N/A 6811.01
MSE 9668.94 5437.58 1364.56 105.02 103.61 92.17 50.2

TABLE 11: A comparison of the MSE of the proposed model with that of L08 and HPSO model for the training phase based on high
order FLRGs.

Models 3rd- order 4th- order 5th- order 6th- order 7th- order 8th- order
L08 208.79 142.26 143.31 147.14 105.02 124.48
HPSO 152.47 148.14 112.24 122.68 103.61 108.37
Our model 70 59.4 57.4 52.2 50.2 57.6

References [16] Huang, Y. L. et al. A hybrid forecasting model for enrollments


based on aggregated fuzzy time series and particle swarm
optimization. Expert Systems with Applications, 38, 80148023,
[1] Q. Song, B.S. Chissom. Forecasting Enrollments with Fuzzy
2011
Time Series Part I, Fuzzy set and system, vol. 54, pp.1-9,1993b.
[17] Nguyen Cong Dieu, Nghiem Van Tinh, Fuzzy time se-
[2] Q. Song, B.S. Chissom. Forecasting Enrollments with Fuzzy
ries forecasting based on time-depending fuzzy relationship
Time Series Part II, Fuzzy set and system, vol. 62, pp. 1-8,
groups and particle swarm optimization, In :Proceedings of
1994.
the 9th National conference on Fundamental and Applied
[3] S.M. Chen. Forecasting Enrollments based on Fuzzy Time Information Technology Research(FAIR9), pp.125-133, 2016.
Series, Fuzzy set and system, vol. 81, pp. 311-319, 1996.
[18] Nghiem Van Tinh, Nguyen Cong Dieu, An improved method
[4] S. M. Chen. Forecasting enrollments based on high-order fuzzy for stock market forecasting combining high-order time-
time series, Cybernetics and Systems: An International Journal, variant fuzzy logical relationship groups and particle swam
vol. 33, pp. 1-16, 2002. optimization in : Proceedings of the International Conference,
[5] Singh, S. R. A simple method of forecasting based on fuzzy- Advances in Information and Communication Technology,
time series. Applied Mathematics and Computation, 186, pp.153-166, 2016.
330339, 2007a. [19] S.-M. Chen, K. Tanuwijaya. Fuzzy forecasting based on high-
[6] Singh, S. R. A robust method of forecasting based on fuzzy order fuzzy logical relationships and automatic clustering
time series. Applied Mathematics and Computation, 188, techniques, Expert Systems with Applications 38,1542515437,
472484, 2007b. 2011.
[7] H.K.. Yu. Weighted fuzzy time series models for TAIEX fore- [20] Zhiqiang Zhang, Qiong Zhu. Fuzzy time series forecasting
casting , Physica A, 349 , pp. 609624, 2005. based on k-means clustering, Open Journal of Applied Sci-
[8] Lee, L.-W., Wang, L.-H., & Chen, S.-M. Temperature prediction ences, 100-103, 2012.
and TAIFEX forecasting based on fuzzy logical relationships [21] Bulut, E., Duru, O., & Yoshida, S. A fuzzy time series forecast-
and genetic algorithms. Expert Systems with Applications, 33, ing model formulti-variate forecasting analysis with fuzzy c-
539550, 2007. means clustering. WorldAcademy of Science, Engineering and
[9] Wang, N.-Y, & Chen, S.-M. Temperature prediction and Technology, 63, 765771, 2012.
TAIFEX forecasting based on automatic clustering techniques [22] S. Askari, N. Montazerin, A high-order multi-variable Fuzzy
and two-factors high-order fuzzy time series. Expert Systems Time Series forecasting algorithm based on fuzzy clustering,
with Applications, 36, 21432154, 2009. Expert Systems with Applications ,42, 21212135, 2015.
[10] Huarng, K, 2001b. Heuristic models of fuzzy time series for [23] Chen, S.-M., Chung, N.-Y. Forecasting enrollments of stu-
forecasting. Fuzzy Sets and Systems, 123, 369386 . dents by using fuzzy time series and genetic algorithms. Inter-
[11] Chen, S.M., Chung, N.Y. Forecasting enrollments using high- national Journal of Information and Management Sciences,17,
order fuzzy time series and genetic algorithms. International 117, 2006a.
of Intelligent Systems 21, 485501, 2006b. [24] Lee, L. W. et al. Handling forecasting problems based on
[12] H.K. Yu. A refined fuzzy time-series model for fore- two-factors high-order fuzzy time series. IEEE Transactions on
casting, Phys. A, Stat. Mech. Appl. 346, 657681, 2004; Fuzzy Systems, 14, 468477, 2006.
http://dx.doi.org/10.1016/j.physa.07.024. [25] Lee, L.-W. Wang, L.-H., & Chen, S.-M. Temperature prediction
[13] Huarng, K.H., Yu, T.H.K. ”Ratio-Based Lengths of Intervals and TAIFEX forecasting based on high order fuzzy logical re-
to Improve Fuzzy Time Series Forecasting,” IEEE Transactions lationship and genetic simulated annealing techniques, Expert
on SMC Part B: Cybernetics, Vol. 36, pp. 328340, 2006. Systems with Applications, 34, 328336, 2008b.
[14] Kuo, I. H. et al. An improved method for forecasting en- [26] Ling-Yuan Hsu et al. Temperature prediction and TAIFEX
rollments based on fuzzy time series and particle swarm forecasting based on fuzzy relationships and MTPSO tech-
optimization. Expert Systems with applications, 36, 61086117, niques, Expert Syst. Appl.37, 2756 2770, 2010.
2009. [27] Lizhu Wang, Xiaodong Liu, Witold Pedrycz, Yongyun Shao.
[15] I.H. Kuo et al. Forecasting TAIFEX based on fuzzy time Determination of temporal information granules to improve
series and particle swarm optimization, Expert Systems with forecasting in fuzzy time series, Expert Systems with Applica-
Applications. 37, 14941502, 2010. tions ,41, 31343142, 2014.
Nghiem Van Tinh et al.: IMPROVING THE FORECASTED ACCURACY OF MODEL BASED ON FUZZY TIME SERIES AND K-MEANS CLUSTERING 55

[28] Wei Lu, XueyanChen, WitoldPedrycz, XiaodongLiua, Jian- Nghiem Van Tinh received the B.S. de-
huaYang. Using interval information granules to improve fore- gree in applied mathematics and informa-
casting in fuzzy time series, International Journal of Approxi- tion from HaNoi University of Science and
mate Reasoning 57, 118, 2015. Technology (HUST), VietNam, in 2002. the
[29] S.M. Chen , P.Y. Kao , TAIEX forecasting based on fuzzy time M.S. degree in Information of Technology
series, particle swarm optimization techniques and support from ThaiNguyen University of Information
vector machines , Inf. Sci. 247, 6271, 2013. and Communication Technology, in 2007.
[30] S.M. Chen , Bui Dang Ha Phuong, Fuzzy time series fore- He is currently a Ph.D. student at Institute of
casting based on optimal partitions of intervals and optimal Information Technology, Vietnam Academy
weighting vectors, Knowledge-Based Systems .118, 204 216, of Science and Technology, Hanoi, Viet-
2017 nam and working as a Lecturer in the Thai
[31] Hoang Tung, Nguyen Dinh Thuan, Vu Minh Loc. The parti- Nguyen University of Technology. His current research interests
tioning method based on hedge algebras for fuzzy time series include clustering, optimal technique, software engineering, time
forecasting, Journal of Science and Technology 54 (5), 571-583, series analysis, fuzzy time series and forecasting. He has published
2016. several papers on these topics.
[32] Vu Minh Loc, Nghia Huynh Pham Thanh. Context-aware
approach to improve result of forecasting enrollment in fuzzy
time series, International Journal of Emerging Technologies in
Engineering Research (IJETER), Volume 5, Issue 2, 28-33, 2017
[33] .B. MacQueen, Some methods for classification and analy- Nguyen Cong Dieu received the Ph.D.
sis of multivariate observations, in: Proceedings of the Fifth degree in applied mathematics from Insti-
Symposium on Mathematical Statistics and Probability, vol. 1, tute of Information Technology (IOIT), Viet-
University of California Press, Berkeley, CA, pp. 281-297, 1967. namese Academy of Science and Technol-
ogy (1995). He is currently a lecturer at the
Faculty of Information Technology, Thang
Long University and a research associate
of the institute of Information Technology.
His research interests include Data mining,
Computational Intelligence method in data
analysis, fuzzy time series forecasting.

You might also like