Stock Market Forecasting Using Time Series Analysis
Abstract— this paper analyzes the various aspects of Stock Exchange for the prediction of online shares trading. The general research associated with the stock or share market is highly focusing on neither buy nor sell but it fails to address the dimensionality and expectancy of a new investor. The common trend towards the stock market among the society is that it is highly risky for investment or not suitable for trade by the people. The seasonal variance and steady flow of any index will help both existing and naïve investors to understand and make a decision to invest in the stock/share market.
The time series analysis will be the best tool for forecasting the trend or even future. The trend chart will provide adequate guidance for the investor and use a machine learning technique to forecast stocks.
Keywords— Stock Market, Shares, IPO, Learning analytics, Data mining, ARIMA, Time Series, KPI
Introduction
The stock market is a market that enables the seamless exchange of buying and selling of company stocks. Every Stock Exchange has its own Stock Index value. The index is the average value that is calculated by combining several stocks. This helps in representing the entire stock market and predicting the market’s movement over time. The stock market can have a huge impact on people and the country’s economy as a whole. Therefore, predicting the stock trends in an efficient manner can minimize the risk of loss and maximize profit.
A stock or share or company equity is a financial instrument that represents ownership in a company or corporation and represents a proportionate claim on its assets what it might own and earnings what it generates in the form of profits
Time series analysis will be the best tool for forecasting the trend or even future. The trend chart will provide adequate guidance for the investor. So let us understand this concept in great detail and use a machine learning technique to forecast stocks.
STOCK MARKET
DESCRIPTION
The concept behind how the stock market works is pretty simple. Operating much like an auction house, the stock market enables buyers and sellers to negotiate prices and make trades.
The stock market works through a network of exchanges. New York Stock Exchange, NASDAQ or Sensex are such examples. Companies list shares of their stock on an exchange through a process called an initial public offering (IPO). Investors purchase the shares that allow the company to raise money to grow its business. Investors can then buy and sell these stocks among themselves, and the exchange tracks the supply and demand of each listed stock.
That supply and demand help determine the price for each security or the levels at which stock market participants either the investors or traders agree to buy or sell.
Predicting how the stock market will perform is one of the most difficult things to do. There are so many factors involved in the prediction — physical factors vs. physiological, political vs. religious, rational and irrational behavior, etc. All these aspects combine to make share prices volatile and very difficult to predict with a high degree of accuracy.
MACHINE LEARNING IN STOCK MARKET
Stock and financial markets tend to be unpredictable and even illogical, just like the outcome of dice or coin. Due to these characteristics, financial data should necessarily possess a rather turbulent structure which often makes it hard to find reliable patterns. Modeling turbulent structures requires machine learning algorithms capable of finding hidden structures within the data and predicts how they will affect them in the future. The most efficient methodology to achieve this is Machine Learning and Deep Learning. Deep learning can deal with complex structures easily and extract relationships that further increase the accuracy of the generated results.
Machine learning has the potential to ease the whole process by analyzing large chunks of data, spotting significant patterns and generating a single output that navigates traders towards a particular decision based on predicted asset prices.
PREDICTIVE ANALYSIS
Stock prices are not randomly generated values instead they can be treated as a discrete-time series model which is based on a set of well-defined numerical data items collected at successive points at regular intervals of time. Since it is essential to identify a model to analyze trends of stock prices with adequate information for decision making, it recommends that transforming the time series using ARIMA is a better algorithmic approach than forecasting directly, as it gives more authentic and reliable results.
Autoregressive Integrated Moving Average (ARIMA) Model converts non-stationary data to stationary data before working on it. It is one of the most popular models to predict linear time series data.
Figure-1 Stock Market System
ARIMA model has been used extensively in the field of finance and economics as it is known to be robust, efficient and has a strong potential for short-term share market prediction.
Time Series analysis Using ARIMA
DESCRIPTION
The purpose of differencing is it to make the time series stationary. It is to be careful to not over-difference the series. Because, an over differenced series may still be stationary, which in turn will affect the model parameters. So it is very essential to determine the right order of differencing?
The right order of differencing is the minimum differencing required to get a near-stationary series which roams around a defined mean and the ACF plot reaches to zero fairly quick.
If the autocorrelations are positive for many number of lags (10 or more), then the series needs further differencing. On the other hand, if the lag 1 autocorrelation itself is too negative, then the series is probably over-differenced. In the event, you can’t really decide between two orders of differencing, then go with the order that gives the least standard deviation in the differenced series.
Figure- Predictive Analytics
TIME-SERIES & FORECASTING MODELS
Time series forecasting models are capable to predict the future values based on previously observed values. Time-series forecasting is widely used for non-stationary data. Non-stationary data are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.
These non-stationary input data (used as input to these models) are usually called time-series. Some examples of time-series include the temperature values over time, stock price over time, price of a house over time etc. So, the input is a signal (time-series) that is defined by observations taken sequentially in time as shown in figure 3
Figure-3 Time Series
A time series is a sequence of observations taken sequentially in time. Forecasting a time series can be broadly divided into two types.
Univariate Time Series Forecasting
If only the previous values of the time series are used to predict its future values.
Multi Variate Time Series Forecasting
If more than two predictors are used other than the series i.e. exogenous variables are used to forecast it.
BENEFITS OF PREDICTIVE FORCASTING
The Results indicate that the stock price is unpredictable when the traditional classifier is used. The existence system reported highly predictive values, by selecting an appropriate time period for their experiment to obtain highly predictive scores. The existing system does not perform well when there is a change in the operating environment.
It does not focus on external events in the environment, like news events or social media. It exploits only one data source, thus highly biased.
The existing system is unable to treat the multi-variate instances of the historical data.
LIMITATIONS OF ARIMA
The main flaw in time series analysis only works with stationary data.
Reduction in capacity due to overheads
A significant difference between the ARIMA methodology and previous methods is that ARIMA does not make assumptions about the number of terms or the relative weights to be assigned to the terms. To specify the model, the analyst first selects the appropriate model, including the number of p, d, and q terms. then calculates the coefficients and gives a refined suggestion of the model parameters by using a nonlinear least squares method.
Increase in moving average power
The most standard choice of p, d and q terms of the ARIMA (p,d,q) model has the potential of improving forecast accuracy.
There are two ideas for the model selection: one is select one appropriate model for the series under consideration, the other is use a general selection methodology which will select the appropriate model for each series from a group of candidate models. It uses its own lags as predictors. Linear regression models, as you know, work best when the predictors are not correlated and are independent of each other. So it is good to make a series stationary?
The most common approach is to difference it. That is, subtract the previous value from the current value. Sometimes, depending on the complexity of the series, more than one differencing may be needed.
The value of d, therefore, is the minimum number of differencing needed to make the series stationary. And if the time series is already stationary, then d = 0.
IMPLEMENTTION FOR ARIMA FORECASTING
DESCRIPTION
An ARIMA model is one where the time series was differenced at least once to make it stationary and you combine the AR and the MA terms. So the equation becomes:
ACCURACY METRICS FOR TIME SERIES FORECAST
The commonly used accuracy metrics to judge forecasts are:
Mean Absolute Percentage Error (MAPE)
Mean Error (ME)
Mean Absolute Error (MAE)
Mean Percentage Error (MPE)
Root Mean Squared Error (RMSE)
Lag 1 Autocorrelation of Error (ACF1)
Correlation between Actual and the Forecast
Min-Max Error (MINMAX)
Figure-4 MAPE & RMSE Formulae
Typically, if you are comparing forecasts of two different series, the MAPE, Correlation and Min-Max Error can be used.
Other metrics can be used because only the above three are percentage errors that vary between 0 and 1. That way, it can be judged how good it is the forecast irrespective of the scale of the series.
The other error metrics are quantities. That implies, an RMSE of 100 for a series whose mean is in 1000’s is better than an RMSE of 5 for series in 10’s. So, you can’t really use them to compare the forecasts of two different scaled time series.
FORECAST KPIs
Measuring forecast accuracy (or error) is not an easy task as there is no one-size-fits-all indicator. The experimentation will show you what Key Performance Indicator (KPI) is best for you. As you will see, each indicator will avoid some pitfalls but will be prone to others.
The first distinction we have to make is the difference between the precision of a forecast and its bias:
Bias
It represents the historical average error. Basically, will your forecasts be, on average, too high (i.e., you overshot the demand) or too low (i.e., you undershot the demand)? This will give you the overall direction of the error.
Precision
It measures how much spread you will have between the forecast and the actual value. The precision of a forecast gives an idea of the magnitude of the errors but not their overall direction.
Figure-5 Biases and Precisions
Of course, as you can see in the figure below, what we want to have is a forecast that is both precise and unbiased.
TIME SERIES USING ARIMA – CALCULATIONS, PREDICTIONS & INFERENCES
Most of the stockbrokers while making the prediction utilized the specialized, fundamental or the time series analysis. Overall, these techniques couldn’t be trusted completely, so there emerged the need to give a strong strategy to financial exchange prediction. To find the best accurate result, the methodology chose to be implemented as machine learning and AI along with supervised classifier. It is shown in the figure 6.
Figure-6 System Architecture for Machine Learning
CLASSIFICATION
Classification is an instance of supervised learning where a set is analyzed and categorized based on a common attribute. From the values or the data are given, classification draws some conclusion from the observed value. Classifiers for the stock market might be
Date: days
Open: price of the stock at the opening of the trading
High: highest price of the stock during the trading day
Low: lowest price of the stock during the trading day
Close: price of the stock at the closing of the trading
Volume: amount of stocks traded.
DATA COLLECTION AND DATA PREPROCESSING
Data pre-processing is a part of data mining, which involves transforming raw data into a more coherent format. Raw data is usually, inconsistent, or incomplete and usually contains many errors. The price of the stock at the closing of the trading is the main aim to predict the closing value of the data. Therefore, we focus on the close value of the stocks.
Data collection is a very basic module and the initial step towards the project. It generally deals with the collection of the right dataset. The dataset that is to be used in the market prediction must be used to be filtered based on various aspects.
IMPLEMENTING STOCK PRICE FORECASTING
The dataset consists of stock market data from Kaggle.com.
The data shows the stock price of stock from 1996 till 2017. The goal is to train an ARIMA model with optimal parameters that will forecast the closing price of the stocks on the test data.
To calculate in time series analysis, required libraries are loaded in R language or Python for example Pandas etc. Following steps are taken with data to make prediction. We have to import libraries, read data, checking for missing values, checking for categorical data, Standardize the data, PCA transformation and Data splitting. We load the dataset reading the csv file as shown in the figure7 we then visualize the per day closing price of the stock as shown in figure 7.
Figure-7 Stock Market Dataset
Figure-8 Graph Plotting
The given time series is thought to consist of three systematic components including level, trend, seasonality, and one non-systematic component called noise.
These components are defined as follows:
Level: The average value in the series.
Trend: The increasing or decreasing value in the series.
Seasonality: The repeating short-term cycle in the series.
Noise: The random variation in the series.
It is most important and need to check if a series is stationary or not because time series analysis only works with stationary data. If both mean and standard deviation are flat lines (constant mean and constant variance), the series becomes stationary.
Figur-9 Stationarity Checking
Figure-10 Stock Price Predictions
It is submitted to have this model did quite handsomely. It would also be checked for the commonly used accuracy metrics to judge forecast results.
RMSE: 0.18250811086267923
MAPE: 0.035328833278944705
Time series analysis shows the results in context with time to analyze. Around 3.5% MAPE(Mean absolute Percentage Error) implies the model is about 96.5% accurate in predicting the test set observations.
Conclusion
The experimental results obtained demonstrated the potential of ARIMA model to predict the stock price indices on short-term basis. This could guide the investors in the stock market to make profitable investment decisions whether to buy or sell or hold a share. With the results obtained ARIMA model can compete reasonably well with emerging forecasting techniques in short-term prediction. Looking from different viewpoint and understanding the insights of the data such as it trend, seasonality, autocorrelation is beneficial to understand the sentiment analysis of the participants and predictions can also be performed for a longer duration. Through the evaluation of social media analysis particularly on public opinions using fundamental analysis techniques can be incorporated in order to obtain better results. In this way we can provide the improved results for investors in the stock market to choose the better timing for profitable investment decisions.
References
B.o. Qian, K. RasheedStock market prediction with multiple classifiers, Appl. Intell., 26 (1) (2007), pp. 25-33
N.A.A. Hussain, S.S.A. Ali, M.N.M. Saad, N. Nordin, 2016, December. Underactuated nonlinear adaptive control approach using U-model for multivariable underwater glider control parameters. In 2016 IEEE International Conference on Underwater System Technology: Theory and Applications (USYS) (pp. 19-25). IEEE.
D. Shah, H. Isah, F. ZulkernineStock market analysis: A review and taxonomy of prediction techniques, Int. J. Financial Stud., 7 (2) (2019), p. 26
Pathak, Ashish, Nisha P. Shetty., 2019. Indian stock market prediction using ML and sentiment analysis. In Computational Intelligence in Data Mining, pp. 595-603. Springer, Singapore, pp. 595-603.
J. Patel, S. Shah, P. Thakkar, K. KotechaPredicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques, Expert Syst. Appl., 42 (1) (2015), pp. 259-268
Hernández-Álvarez, Myriam, Edgar A. Torres Hernández, Sang Guun Yoo., 2019. Stock Market Data Prediction Using ML Techniques.“ In International Conference on Information Technology & Systems, Springer, Cham, pp. 539-547.
S. Banik, A.K. Khan, M. Anwer, 2012, December. Dhaka stock market timing decisions by hybrid machine learning technique. In 2012 15th International Conference on Computer and Information Technology (ICCIT) (pp. 384-389). IEEE.
Ahmad WaqasAnalyzing different ML techniques for stock market prediction, Int. J. Comput. Sci. Inform. Sec., 12 (2014), pp. 12-17
H. Yang, L. Chan, I. King. Support vector machine regression for volatile stock market prediction. In International Conference on Intelligent Data Engineering and Automated Learning 2002 Aug 12 (pp. 391-396). Springer, Berlin, Heidelberg.
H.R. Patel, S.M. Parikh, D.N. Darji. Prediction model for the stock market using news based different Classification, Regression, and Statistical Techniques:(PMSMN). In2016 International Conference on ICT in Business Industry & Government (ICTBIG) 2016 Nov 18 (pp. 1-5). IEEE.
K.J. Kim, W.B. LeeStock market prediction using artificial NN with optimal feature transformation, Neural Comput. Appl., 13 (3) (2004), pp. 255-260
P.D. Yoo, M.H. Kim, T. Jan. Financial forecasting: advanced machine learning techniques in stock market analysis. In2005 Pakistan Section Multitopic Conference 2005 Dec 24 (pp. 1-7). IEEE.
S.K. Chandar, M. Sumathi, S.N. SivanandamPrediction of the stock market price using a hybrid of wavelet transform and artificial neural network, Indian J. Sci. Technol., 9 (8) (2016), pp. 1-5
Jonathan L. TicknorA bayesian regularized artificial neural network for stock market forecasting, Expert Syst. Appl., 40 (14) (2013), pp. 5501-5506
A. Sharma, D. Bhuriya, U. Singh, 2017, April. Survey of stock market prediction using a machine learning approach. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA) (Vol. 2, pp. 506-509). IEEE.
D. Enke, M. Grauer, N. MehdiyevStock market prediction with multiple regression, fuzzy type-2 clustering, and neural networks, Procedia Comput. Sci., 1 (6) (2011), pp. 201-206
H. Chung, K.S. ShinGenetic algorithm-optimized long short-term memory network for stock market prediction, Sustainability, 10 (10) (2018), p. 3765
X. Li, H. Xie, R. Wang, Y. Cai, J. Cao, F. Wang, X. DengEmpirical analysis: stock market prediction via extreme learning machine, Neural Comput. Appl., 27 (1) (2016), pp. 67-78
M.M. Pai, A. Nayak, R.M. PaiPrediction models for the Indian stock market, Procedia Comput. Sci., 89 (2016), pp. 441-449
Ritika Singh, Shashi SrivastavaStock prediction using deep learning, Multimedia Tools Appl., 76 (18) (2017), pp. 18569-18584
E. Guresen, G. Kayakutlu, T.U. DaimUsing artificial neural network models in stock market index prediction, Expert Syst. Appl., 38 (8) (2011), pp. 10389-10397
J. Li, H. Bu, J. Wu. (2017, June). Sentiment-aware stock market prediction: A deep learning method. In 2017 international conference on service systems and service management (pp. 1-6). IEEE.