Using Machine Learning Classifiers To Predict Stock Exchange Index

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

International Journal of Machine Learning and Computing, Vol. 7, No.

2, April 2017

Using Machine Learning Classifiers to Predict Stock


Exchange Index
Mustansar Ali Ghazanfar, Saad Ali Alahmari, Yasmeen Fahad Aldhafiri, Anam Mustaqeem, Muazzam
Maqsood, and Muhammad Awais Azam


with the help neural networks, which can extract useful
Abstract—Predicting stock exchange index is an attractive information from a larger set of data. Halbert reported some
research topic in the field of machine learning. Numerous results of research using neural network techniques to model
studies have been conducted using various techniques to the asset price movements [2]. A step by step model based
predict stock market volume. This paper presents first detailed
study on data of Karachi Stock Exchange (KSE) and Saudi
on artificial neural network for classification and prediction
Stock Exchange (SSE) to predict the stock market volume of is proposed by Jing Tao Yao and Chew Lim in [3].
ten different companies. In this study, we have applied and Stock index (Taiwan) is predicted by Tiffany Hui-Kuang
compared salient machine learning algorithms to predict stock and Kun-Huang Huarng in [4] by using a novel fuzzy time
exchange volume. The performance of these algorithms have series model. Stock market price (Nigeria) is forecasted by
been compared using accuracy metrics on the dataset, collected Akinwale et al in [5] by implementing error back
over the period of six months, by crawling the KSE and SSE
website.
propagation and regression analysis. Predictive relationship
of numerous economic and financial variables is evaluated
Index Terms—Stock exchange prediction, machine learning, by David Enke and Suraphan Thawornwong in [6].
SVM, neural networks, Bayesian network, Ada-boost. Abdulsalam sulaiman et al. in [7] extracted values of
variables from database by using the moving average [MA]
method. These values are used to predict the future values of
I. INTRODUCTION
other variables through the use of time series data. Kuang
Forecasting of stock market has been a vital topic in Yu Huang and Chuen-Jiuan Jane in [8] proposed a hybrid
different fields of computational sciences because of its model to predict stock market that use autoregressive
possible monetary profit. Stock market is a place where high exogenous (ARX) prediction model, grey system and rough
capital is invested and companies do trading of their shares. set theory to predict stock market. Md. Rafiul Hassan et al.
Stock market forecasting poses the challenge of disproving in [9] predicted behavior of financial market by using a
the Efficient Market Hypothesis, which states that the mixed model of Hidden Markov Model, Artificial Neural
market is efficient and cannot be predicted. Researchers Network and Genetic Algorithms.
have worked hard to prove the fact that financial markets are Yi-Fan Wang et al. in [10] improved accuracy in
predictable. With the advancement and availability of prediction of stack indexes by using Markov chain concepts
technology, stock markets are now more accessible to into fuzzy stochastic prediction. Hsien-Lun Wong et al. in
investors. Various models have been proposed, both in [11] claimed that for short term period, time fuzzy time
industry and academia, for stock market prediction ranging series performs better for prediction. They used fuzzy time
from machine learning, to data mining, to statistical models. series employing ARIMA and vector ARMA model for
In this paper, we have predicted the stock exchange prediction.
volume of Karachi stock exchange (www.kse.com.pk) by Johan Bollen et al. in [12] used public sentiment to
crawling the real time data of ten different companies (of improve the prediction accuracy of prediction algorithms by
different sectors) by using salient machine learning analyzing twitter posts with tools like GPOMS and Opinion
classifiers like SVM, KNN, Ada-boost, Naïve Bayes, finder. Shunrong Shen and Tongda Zhang in [13] proposed
Bayesian Networks, Multilayer Perceptron and RBF. The a new prediction algorithm that used the notion of temporal
performance of these classifiers has been compared using correlation among global markets and various important
different matrices that include mean absolute error, root products to predict trend of next day stock. SVM was used
mean square error and accuracy. as a classifier in this study. Osman Hegazy et al. in [14] [15]
proposed a model to predict the stock market prices by using
II. LITERATURE REVIEW Particle Swarm Optimization (PSO) and Least Square
Support Vector Machine.
Stock rate is predicted by R.K. and Pawar D. D. in [1],

Manuscript received February 15, 2017, revised March 23, 2017. III. CLASSIFICATION APPROACHES EMPLOYED FOR
Mustansar Ali Ghazanfar is with University of Engineering and PREDICTING KSE AND SAUDI STOCK DATA
Technology Taxila, Pakistan (e-mail:[email protected].)
Saad Ali Alahmari is with Department of Computer Science, Shaqra In this section, we briefly describe various machine-
University, Saudi Arabia learning algorithms used for forecasting. For their detailed
Yasmeen Fahad Aldhafiri is with Department of Business implementation (and parameter setup), refer to our previous
Administration, Jubail University College, Saudi Arabia
Anam Mustaqeem, Muazzam Maqsood, and Muhammad Awais Azam
work [16].
are with University of Engineering and Technology Taxila, Pakistan.

doi: 10.18178/ijmlc.2017.7.2.614 24
International Journal of Machine Learning and Computing, Vol. 7, No. 2, April 2017

A. Support Vector Machine (SVM) a) Fit the machine learning algorithm ϕm (⋅) using
SVM find the optimal separating hyper-plane between weights wi on the training data.
two classes by solving the linearly constrained quadratic b) Compute errm = Ew[I(y ≠ ϕm (⋅)) ] and cm = log
optimization problem and the solution is relatively globally ((1−errm)/errm)
optimal. Researchers have claimed that SVM offer superior c) Update wi ← wi exp[cm ⋅I(yi ≠ ϕm (⋅)) ], i =1,...,n and
performance than other approaches [17]. As the varying renormalize so that ∑wi =1.
nature of stock market data declares it dynamically as a non- M
linear data; therefore, it is assumed that optimal performance 3) Output ϕ (⋅) = sign [ ∑ cm ϕm (⋅)].m=1
will be achieved by using a non-linear kernel. We have E. Multilayer Perceptron
applied polynomial kernel and used 1v1 (1 verses 1)
The approach used to train the multilayer perceptron is
approach for multiclass classification.
explained in below:
B. Naï ve Bayes Classifier The open, high, low, current and change in volume are
Naïve Bayes classifier is a statistical classifiers, which inputs to the network. The number of inputs to the network
predict class membership based on probabilities. Naive is 4. The output is the prediction of volume of each
Bayes classifiers make use of class conditional company. An architecture of Multi-Layer Perceptron is
independence, which makes it computationally faster. Class shown in Fig. 1.
conditional independence means every attribute in the given
class is independent of other attributes. Naive Bayes
classifier works as follows:
Let us suppose T represents a training set of samples. Let
there are k classes, so class labels would be 𝐶1, 𝐶2... C𝑘.
Each record is represented by an n-dimensional vector, 𝑋 =
{X1, X2,…, X𝑛}. It represents n measured values of the n
attributes 𝐴1, 𝐴2,..., 𝐴𝑛 respectively. Classifier will predict
the class of X based on highest a-posteriori probability. Thus
we find the class that maximizes (𝐶𝑖|𝑋). By Bayes Theorem,
we have k: Figure 1: Structure of multilayer perceptron.

The inputs are multiplied by their weights and summed.


P (𝐶𝑖/ 𝑋) = 𝑃 (𝑋|𝐶𝑖) 𝑃 (𝐶𝑖)/ 𝑃 (𝑋). (1)
By using transfer function these input functions are mapped
to output. Mostly, Hyperbolic and Sigmoid function are
As 𝑃(X) has same value for all classes, we can ignore it.
used as transfer function. In Multi-layered Perceptron, the
Naïve Bayes makes class conditional independence
weighted input Xj is computed as follows:
assumption. Mathematically:
𝑛 Xj = ∑ Wij zi, (3)
P (𝑋|𝐶𝑖 ) = 𝑘 =1 P(𝑥𝑘 |𝐶𝑖 ). (2)
i
The probabilities (𝑥1|𝐶𝑖 ), (𝑥2|𝐶𝑖 ),..., (𝑥𝑛|𝐶𝑖 ) are computed
where, Wij denotes the weight between the ith and the jth
from the training set. In Equation 2, the term 𝑥𝑘 denotes the
unit and zi denotes the activity level of the jth unit in the
value of attribute 𝐴𝑘 for the given sample.
previous layer. Typically, the activity zj is calculated using
C. K-Nearest Neighbors the sigmoid function as follows:
K-Nearest Neighbor (KNN) is a simple to implement zj =1/1-e-xj (4)
machine learning classifier. Classification using similarity
approach can map the problem of stock prediction. The Mostly, Back-Propagation algorithm is used that reduce
training stock data and test data is stored into a set of vectors. the errors to adjust the weights. The error E is computed as:
Each stock feature is represented by an N dimension vector. E= 1/2∑ (zi-di) 2 (5)
Decision is taken on the basis of similarity parameter such i
as Euclidean distance. The KNN classifier works as follows:
1) Compute k number of nearest neighbors. where zj represents activity level of the jth unit in the top
2) Determine the distance between the test samples and layer and dj represents the desired output of the jth unit
the training samples by using metric such as Euclidean respectively.
distance. F. Bayesian Network
3) Perform sorting on all the training data is on the basis
It is graphical model based on probability. Direct acyclic
of distances
graph is used in this model to represent a set of random
4) Decide class labels of k nearest neighbors on the basis
variables and their dependencies.
of majority vote and assign it as a prediction value of
Let us suppose that there are two events S and R which
the query record.
can cause another event G. also S is directly affected by R.A
D. Ada-Boost Bayesian network can model such situation easily as can be
The ada boost algorithm works as follows: seen in Figure 2. All three events are represented by these
1) Start with weights wi =1/ n, i =1, …, n three variables. These variables can be either true or false.
2) For m = 1, …, M do:

25
International Journal of Machine Learning and Computing, Vol. 7, No. 2, April 2017

different well-known companies of Pakistan, by crawling


the Karachi stock exchange website (www.kse.com.pk). We
have also tested our algorithms on Suadi Stock Dataset. The
selected data has in total 5 input features including open,
High, Low, Current and Change. As the aim is to predict the
target volume values concerning upcoming n-days prices;
therefore, the selected input data contain information of
stock market data for historical period of 6 months.
We have labeled the data according to the variation in
value in ―change‖ feature. As, we are considering data for 6
Fig. 2. Example of Bayesian network. months so very small fluctuations (which might be
considered as noise instead of real signal) in these values
The joint probability function is:
can be ignored. Hence, we have assigned three different
P (G, S, R) =P (G|S, R) P (S|R) P(R) (6) labels—A, B and C based on the ―change‖ value. As we
have dataset for a period of 6 months (which is quite a
The model can tell the probability of R, given the G by
massive data), hence we have taken samples for all 10
using the conditional probability formula:
companies after every 4 hours because prediction will not be
P(R=T| G=T)= P(G=T,R=T) / P(G=T) = ∑SE(T,F) affected by small changes, occur in the span of 4 hours.
P(G=T,S,R=T) / = ∑SE(T,F) P(G=T,S.R) (7) After taking samples after every 4 hours we have 24, 000
samples on average for all companies.
G. Radial basis Function (RBF)
For labeling the data, a threshold has been set on the input
RBF is a feed forward neural network. Its structure is feature ―change‖. The column of ―change‖ ranges from -3 to
based on three layers, namely; input, hidden, and output 3 in values showing small fluctuations in data. Therefore,
layer. The input layer comprises of units of signal source. the threshold has been set to separate positive values from
The problem requirements determines the number of units those of negative values. The values exactly matching to ―0‖
associated to hidden layer. The hidden layer retorts to the are given class label ―C‖; values below ―0‖ are assigned
input model and yields the corresponding output. The class ―B‖ while those of above ―0‖ are assigned class label
hidden layer units’ activation function is RBF that can be ―A‖. The possible selected features and their labels are
demonstrated by Fig. 3. shown in Table I.
In Fig. 3, X = (x1, x2... xm) denotes a m-dimensional
input vector whereas and W= (w1, w2, . . . , wn) denotes the TABLE I: INPUT FEATURES ALONG-WITHASSIGNED LABEL
weight of output layer. The Activation function is Open High Low Current Change Volume Label
represented as gi(X), which is a Gaussian function. Where, i
51.55 52.25 51.6 51.7 0.15 51652 A
= 1, 2, … , n. n shows the number of neurons in hidden
layer. In RBF, the output shown by ith neuron of hidden 52 53.99 52.2 52.46 0.46 329560 A
layer is expressed as:
37.6 38.98 36.91 37.6 0 2 C
qi = gi (||X-Ci||)= exp (-||X-Ci||2/2σi2). (8)
where Ci is the center of ith activation function, and ||*|| is We sorted the dataset based on the time feature, where
Euclidian norm. The term denoted by σi is the width of the historical data is used for training and the most recent data is
respective field. The linear combination of units on the used for testing. Specifically, we divided the sorted dataset
hidden layer defines the activation of the output layer, as into 80% training set and 20% testing sets. We conducted 2-
shown in Equation 9: fold cross validation over 80% training set for learning the
optimal parameters.
y   in1wi qi . (9) Three different indices are used as measures of prediction
accuracy, which are Mean Absolute Error (MAE), Root
where wi denotes the weights from hidden to output layer.
Mean Square Error (RMSE) and Accuracy. They have been
used in [16], [18], [19]. Mathematically, they are defined as:

1 𝑁
MAE =
𝑁 𝑖=1 |𝑑𝑖 − 𝑧𝑖 | (11)

1 𝑁
RMSE = 𝑖=1(𝑑𝑖 − 𝑧𝑖 )2 (12)
𝑁

𝑁𝑐
Accuracy = 𝑁 (13)
Fig. 3. Structure of RBF.
where N denotes the total number of samples forecasted, di
IV. EXPERIMENTAL SETUP denotes the actual value of a sample, zi denotes the
We conducted experiments on historical dataset, collected forecasting value of a sample, and 𝑁𝑐 denotes the total
over the period of 6 months (Apr 2013 to Sep 2013) for ten number of correctly classified samples.

26
International Journal of Machine Learning and Computing, Vol. 7, No. 2, April 2017

Fig. 4(A): Comparison of RMSA.

Fig. 4(B): Comparison of RMSA.

Comparison of Accuracies
120
100
80
60
40
20
0

MultiLayer Perceptron ADABOOST Bayes_Net

Fig. 4(C): Comparison of accuracies.

these three classifiers but overall ADABOOST gives better


V. RESULTS AND DISCUSSIONS results for both KSA as compared to other two classifiers.
The experimental results have been shown in Figure 4. For both the datasets ADABOOST gives better accuracies
We have repeated the experiment for 11 different companies and lower MAEs. The detailed results for these three
including Saudi company by applying various algorithms classifiers are given in Appendix A.
and have calculated MAE (and RMSE) against each
algorithm. The MAE is plotted against horizontal axis while VI. CONCLUSION AND FUTURE WORK
the classifiers are shown on the vertical axis. The lowest Stock market prediction is an area of potential research
MAE in each figure is representing the indication of the best and of monetary benefit as its total market capitalization is
prediction algorithm for that specific company. Figure massive. This paper attempts to become a fore bringer in
4(a,b,c) depicts the RMSA, MAEs and Accuracies for best prediction of Karachi Stock Exchange (KSE) and we have
three classifiers (Adaboost, Multilayer Perceptron and Bayes also tested our algorithm on Saudi Stock dataset for TASI
Ne) out of ten classifers we have tested for this research. company. In this study we crawled data of ten renowned
From the figures it is obvious that these three classifiers companies taken from KSE over the period of six months.
Multilayer Perceptron, Adaboost and Bayes Net shows very Different machine learning classifiers have been employed
good results and gives different results for different datasets to predict the future volume of these companies. Ada-boost,
of companies. Multilayer Perceptron and Bayesian Network have shown
It is very difficult to predict the best classifiers out of good results.

27
International Journal of Machine Learning and Computing, Vol. 7, No. 2, April 2017

APPENDIX A:
TABLE A1: A COMPARISON OF THE BEST THREE PERFORMING CLASSIFIERS IN TERMS OF ERROR AND ACCURACY FOR DIFFERENT SECTORS

ADABOOST MAE RMSE Accuracy (%)


Electricity 0.5776 0.798 16.67
Industrial Engineering 0.4964 0.6657 37.25
Automobile and Parts 0.2902 0.5302 55.13
Fixed Line Telecommunication 0.0151 0.0211 100
Chemicals 0.1357 0.6379 78.88
Financial Services 0.1127 0.2821 89.58
Oil and Gas NA NA NA
Banks 0.0088 0.939 98.67
Pharma and Bio Tech 0.5951 0.7487 21.11
General Industrials 0.6422 0.7916 16.66
TASI 0.3545 0.2545 54
Multi-Layer Perceptron
Electricity 0.4988 0.6021 16.67
Industrial Engineering 0.4345 0.6419 37.25
Automobile and Parts 0.2162 0.4259 59.23
Fixed Line Telecommunication 0.3116 0.4114 58.25
Chemicals 0.1142 0.22 86.96
Financial Services 0.2538 0.4895 62.81
Oil and Gas 0.0904 0.2402 91.48
Banks 0.0762 0.1913 89.4
Pharma and Bio Tech 0.5756 0.7219 21.11
General Industrials 0.6638 0.7994 0
TASI 0.2845 0.6545 75.46
Bayesian Network
Electricity 0.5731 0.7353 16.67
Industrial Engineering 0.5858 0.7011 0
Automobile and Parts 0.5301 0.6722 16.67
Fixed Line Telecommunication 0.0814 0.2553 88.08
Chemicals 0.0342 0.1173 97.52
Financial Services 0.259 0.4961 61.8
Oil and Gas 0.058 0.204 100
Banks 0.0931 0.2974 86.09
Pharma and Bio Tech 0.4119 0.6137 42.77
General Industrials NA NA NA
TASI 0.3545 0.745 67

As a future work, we focus on using social media analysis, [4] H.-K. Y. Tiffany and K.-H. Huarng, ―A neural network-based fuzzy
time series model to improve forecasting,‖ Elsevier, pp. 3366-3372,
for instance, using Tweets’ sentiments for specific stock in 2010.
addition to historical data for stocks' trend prediction. We [5] A. T. Akinwale, O. T. Arogundade, and A. F. Adekoya, ―Translated
reckon the resultant hybrid framework of these two Nigeria stock market price using artificial neural network for effective
prediction,‖ Journal of Theoretical and Applied Information
approaches will further improve the results. A cross Technology, 2009.
platform mobile application of this work is also in progress. [6] D. Enke and S. Thawornwong, ―The use of data mining and neural
We can also use more Saudi stock data to develop a model, networks for forecasting stock market returns,‖ 2005.
[7] A. S. Olaniyi et al., ―Stock trend prediction using regression analysis
which can run on both Pakistani and Saudi stock datasets. – A data mining approach,‖ AJSS Journal, 2010.
[8] K. Y. Huang and C.-J. Jane, ―A hybrid model stock market
REFERENCES forecasting and portfolio selection based on ARX, grey system and
RS theories,‖ Expert Systems with Applications, pp. 5387-5392, 2009.
[1] R. K. Daseand D. D. Pawar, ―Application of Artificial Neural [9] M. R. Hassan, B. Nath, and M. Kirley, ―A fusion model of HMM,
Network for stock market predictions: A review of literature,‖ ANN and GA for stock market forecasting,‖ Expert Systems with
International Journal of Machine Intelligence, vol. 2, issue 2, pp. 14- Applications, pp. 171-180, 2007.
17, 2010. [10] Y.-F. Wang, S. M. Cheng and M.-H. Hsu, ―Incorporating the Markov
[2] H. White,‖ Economic prediction using neural networks: The case of chain concepts into fuzzy stochastic prediction of stock indexes,‖
IBM daily stock returns,‖ Department of Economics University of Applied Soft Computing, pp. 613-617, 2010.
California, San Diego. [11] H.-L. Wong, Y.-H. Tu and C.-C. Wang, ―Application of fuzzy time
[3] J. T. Yao and C. L. Tan, Guidelines for Financial Prediction with series models for forecasting the amount of Taiwan export,‖ Experts
Artificial Neural Networks. Systems with Applications, pp. 1456-1470, 2010.

28
International Journal of Machine Learning and Computing, Vol. 7, No. 2, April 2017

[12] B. Johan, H. Mao, and X. Jun Zeng, ―Twitter mood predicts the stock Yasmeen Fahad Aldhafiri holds a MSc from University of Illinois at
market." Journal of Computational Science 2.1 (2011): 1-8. Urbana-Champaign. Her area of research include classification and
[13] S. R. Shen, H. M. Jiang, and T. D. Zhang, Stock Market Forecasting prediction, stock exchange prediction and data mining.
Using Machine Learning Algorithms, 2012.
[14] H. Osman, O. S. Soliman, and M. A. Salam, ―A machine learning
model for stock market prediction,‖ International Journal of Anam Mustaqeem is a PhD Scholar in the
Computer Science and Telecommunications, vol. 4, issue 12, Department of Software Engineering at UET-Taxila.
December 2013. Her areas of interest are Machine learning, Medical
[15] W. E. N. Fenghua et al., ―Stock price prediction based on SSA and Imaging, Software Quality Assurance, Wireless
SVM,‖ Procedia Computer Science, vol. 31, pp. 625-631, 2014. Networks and Adhoc Networks.
[16] G. M. Ali and P.-B. Adam, ―The advantage of careful imputation
sources in sparse data-environment of recommender systems:
generating improved SVD-based recommendations,‖ Informatica, vol.
37, no. 1, pp. 61-92, 2013.
[17] S. Ying, Z. Fengting, and Z. Tao, ―China’s stock index futures
Awais holds a BSc in computer engineering (Gold Medalist) from UET-
regression prediction research based on SVM,‖ China Journal of
Taxila, Pakistan; a MSc in computer engineering and PhD in machine.
Management Science, vol. 3, pp. 35-39, 2013.
She is learning from University of Queen Mary UK. His areas of
[18] G. M. Ali and P.-B. Adam, ―Exploiting context in kernel-mapping
research include recommender systems, Ubiquitous computing, and
recommender system algorithms,‖ in Proc. Sixth International
Internet of things.
Conference on Machine Vision (ICMV 13), Nov. 2013, Italy.
[19] G. M. Ali, P.-B. Adam and S. Sandor, ―Kernel mapping recommender
systems,‖ Information Sciences, vol. 208, pp. 81-104, 2012.
Muazzam Maqsood is a PhD scholar in Software
Engineering Department, UET TAXILA. His area of
research include speech classification and prediction
Mustansar Ali Ghazanfar holds a BSc in Software
engineering (Gold Medalist) from UET-Taxila,
Pakistan; MSc in software engineering and PhD in
machine learning from University of Southampton UK.
His area of research include recommender systems,
prediction, stock market, and socio-economical and
healthcare modelling.

Saad Ali Alahmari gets PhD in artificial intelligence Saad Ali Alahmari gets PhD in artificial intelligence and semantic web
and semantic web from University of Southampton UK. His areas from University of Southampton UK. His areas of research include
of research include recommender systems, Web services, Semantic Web, recommender systems, Web services, Semantic Web, Big data, and data
Big data, and data mining. mining.

29

You might also like