LSTM 4
LSTM 4
LSTM 4
https://doi.org/10.1007/s13042-019-01041-1
ORIGINAL ARTICLE
Abstract
Stock market has received widespread attention from investors. It has always been a hot spot for investors and investment
companies to grasp the change regularity of the stock market and predict its trend. Currently, there are many methods for
stock price prediction. The prediction methods can be roughly divided into two categories: statistical methods and artifi-
cial intelligence methods. Statistical methods include logistic regression model, ARCH model, etc. Artificial intelligence
methods include multi-layer perceptron, convolutional neural network, naive Bayes network, back propagation network,
single-layer LSTM, support vector machine, recurrent neural network, etc. But these studies predict only one single value.
In order to predict multiple values in one model, it need to design a model which can handle multiple inputs and produces
multiple associated output values at the same time. For this purpose, it is proposed an associated deep recurrent neural net-
work model with multiple inputs and multiple outputs based on long short-term memory network. The associated network
model can predict the opening price, the lowest price and the highest price of a stock simultaneously. The associated network
model was compared with LSTM network model and deep recurrent neural network model. The experiments show that the
accuracy of the associated model is superior to the other two models in predicting multiple values at the same time, and its
prediction accuracy is over 95%.
Keywords Deep learning · Machine learning · Long short-term memory (LSTM) · Deep recurrent neural network ·
Associated network
13
Vol.:(0123456789)
International Journal of Machine Learning and Cybernetics
algorithm and experimental parameters. Section 5 introduces Artificial bee colony algorithm was combined with wave-
the experimental data set, the experimental results and the let transforms and recurrent neural network for stock price
analysis on the results. Section 6 concludes the paper. forecasting. Many international stock indices were simulated
for evaluation, including the Dow Jones industrial average
(DJIA), London FTSE 100 index (FTSE), Tokyo Nikkei-225
2 Related works index (Nikkei) and the Taiwan stock exchange Capitalization
Weighted Stock Index (TAIEX). The simulation results show
There are many related researches on stock price prediction. that the system has good prediction performance and can
Support vector machines was applied to build a regression be applied to real-time trading system of stock prediction.
model of historical stock data and to predict the trend of A multi-output speaker model based on RNN-LSTM was
stocks [1]. Particle swarm optimization algorithm is used to used in the field of speech recognition [14]. The experimen-
optimize the parameters of support vector machine, which tal results show that the model is better than a single speaker
can predict the stock value robustly [2]. This study improves model, and fine-tuning under the infrastructure when adding
the support vector machine method, but particle swarm opti- new output branches. Obtaining a new output model not only
mization algorithm requires a long time to calculate. LSTM reduces memory usage but also better than training a new
was combined with naive Bayesian method to extract market speaker model. A multi-input multi-output convolutional
emotion factors to improve the performance of prediction neural network model (MIMO-Net) was designed for cell
[3]. This method can be used to predict financial markets segmentation of fluorescence microscope images [15]. The
in completely different time scales with other variables. experimental results show that this method is superior to
The emotional analysis model integrated with the LSTM the most state-of-the-art deep learning based segmentation
time series learning model to obtain a robust time series method.
model for predicting the opening price of stocks, and the Inspired by the above research, considering that some
results showed that this model could improve the accuracy parameters and indicators of a stock are associated with one
of prediction [11]. Jia [12] discussed the effectiveness of another, it is necessary to design a multi-value associated
LSTM for predicting stock price, and the study showed that neural network model that can handle multiple associated
LSTM is an effective method to predict stock profits. Real- prices of the same stock and output these parameters and
time wavelet denoising was combined with LSTM network indicators at the same time. For this purpose, it is proposed
to predict the east Asian stock index, which corrected some an associated neural network model based on LSTM deep
logic defects in previous studies [13]. Compared with the recurrent network which is established by historical data and
original LSTM, this combination model is greatly improved for predicting the opening price, lowest price and highest
with high prediction accuracy and small regression error. price of the stock on the next day.
Bagging method was used to combine multiple neural net-
work method to predict Chinese stock index (including the
Shanghai composite index and Shenzhen component index) 3 Model design
[4], each neural network was trained by back propagation
method and Adam optimization algorithm, the results show 3.1 Long short‑term memory network
that the method has different accuracy for prediction of dif-
ferent stock index, but the prediction on close is unsatisfac- Long short-term memory network (LSTM) is a particular
tory. The evolutionary method was applied to predict the form of recurrent neural network (RNN), which is the gen-
change trend of stock price [5]. The deep belief network with eral term of a series of neural networks capable of process-
inherent plasticity was used to predict the stock price time ing sequential data. LSTM is a special network structure
series [6]. Convolutional neural network was applied to pre- with three “gate” structures (shown in Fig. 1). Three gates
dict the trend of stock price [7]. A forward multi-layer neural are placed in an LSTM unit, called input gate, forgetting
network model was created for future stock price prediction gate and output gate. While information enters the LSTM’s
by using a hybrid method combining technical analysis vari- network, it can be selected by rules. Only the information
ables and basic analysis variables of stock market indicators conforms to the algorithm will be left, and the information
and BP algorithm [8]. The results show that this method that does not conform will be forgotten through the forget-
has higher accuracy in predicting daily stock price than the ting gate.
technical analysis method. An effective soft computing tech- The gate allows information to be passed selectively and
nology was designed for Dhaka Stock Exchange (DSE) to Eq. 1 shows the default activation function of the LSTM net-
predict the closing price of DSE [9]. The comparison experi- work, the sigmoid function. The LSTM can add and delete
ment with artificial neural network and adaptive neural fuzzy information for neurons through the gating unit. To determine
reasoning system shows that this method is more effective. selectively whether information passes or not, it consists of a
13
International Journal of Machine Learning and Cybernetics
Ct = ft ∗ Ct−1 + it ∗ Ĉ t (4)
ex − e−x
tanh (x) = (2)
ex + e−x
3.2 Deep recurrent neural network
The forgetting gate of the LSTM neural network determines
what information needs to be discarded, which reads ht−1 and A LSTM-based deep recurrent neural network (DRNN)
xt, gives the neuron state Ct−1 a value of 0–1. Equation 3 shows is a variant of the recurrent neural network. To enhance
the calculation method of forgetting probability the expressive power of the model, the loop body at each
( [ ] ) moment can be repeated many times. As shown in Fig. 2, the
ft = 𝜎 Wf ⋅ ht−1 , xt + bf (3)
structure diagram of deep recurrent neural network is given.
13
International Journal of Machine Learning and Cybernetics
LSTM
dropout
4 Design of algorithm and experiments
13
International Journal of Machine Learning and Cybernetics
Basic input
(7 input nodes)
accuracy of the prediction model describing the experimen- iterations in the model, the training will continue to reduce
tal data. Therefore, in the training phase, MSE is used as the the total loss, otherwise training will stop.
criterion to measure the quality of a network model
∑n � �2
� i=1
yi − y�i
(10) 4.2 Parameter setting
MSE (y, y ) =
n
There is a parameter of step size in the input of the LSTM
neural network that means how many historical data to
4.1 Algorithm remember as a reference for predicting the current price.
In order to use a relatively good step size in the experiment
Deep learning often requires a lot of time and computational of the multi-value associated model, a comparison experi-
resources to train. It is needed to find an optimization algo- ment is performed with 6112 sample data, at the step size
rithm that requires less resources and has faster convergence of 5, 10, 20 and 30, and with the iteration number of 50.
speed. The Adam optimization algorithm is an extension The loss variation graphs are shown in Figs. 7, 8, 9 and
of the stochastic gradient descent algorithm and has great 10, separately.
advantages in solving the non-convex optimization problem. According to the loss variation graph at the step size
During the training phase, the Adam optimization of 5, 10, 20 and 30, it is found that the loss at the step
algorithm is used in the model, and Ltotal is used as the size of 10 and 20 decreases the fastest and finally reaches
evaluation function. Multiple values associated with neural a steady state. By comparing the average loss as shown
network model algorithm framework as shown in Fig. 6, in Table 1, it is found that the average loss at step size of
the first input sequence data to Associated Net model, it five is the lowest. The average loss at the step size of 20
contains three DRNN networks in Associated Net model. differs from the loss at the step size by 0.0014901(shown
Each DRNN network produces a loss, and the losses sum in Table 1). Considering the loss variation graph and the
of these three DRNN networks is the total loss. Then the average loss comprehensively, 20 is chosen as the step size
Adam algorithm is used to optimize the total loss. When in the model.
the number of iterations did not reach the set number of
13
International Journal of Machine Learning and Cybernetics
Fig. 7 Loss variation graph at the step size of 5 Fig. 10 Loss variation graph at the step size of 30
5.1 Dataset
13
International Journal of Machine Learning and Cybernetics
Table 2 The identifiers used Parameter name Identifier square error of Associated Net is larger than the average
for stock related technical mean square error of LSTM and DRNN. Because our model
parameters Open price OP is more complex and requires a larger number of iterations.
Close price CP In order to verify this conjecture, several experiments
Lowest price LP is conducted on Associated Net. The opening price, the
Highest price HP highest price and the lowest price of the next day were
Volumes V trained and predicted by the Associated Net model. As
Money M shown in Table 4, and the experimental results proved our
Change C conjecture. The root of this problem is that the associated
network model is composed of multiple deep-recurrent
neural networks. The model is complex, the number of
Table 3 Average variance loss of three models under different train- neurons is large, and multiple output losses are combined,
ing times so the loss of the model decreases slowly. According to the
Training times LSTM DRNN Associated net analysis experiment, the model loss chart of each model
for 200 iterations is drawn, as shown in Figs. 11, 12, and
50 0.0377711 0.0152948 0.037064
13. The output of the Associated Net is the total loss and
100 0.0147428 0.0191181 0.029533
his three sub-losses (opening price loss, lowest price loss,
200 0.0132838 0.00721601 0.026126
highest price loss). From the analysis for the loss chart, it
300 0.00418345 0.0104519 0.019745
is found that the loss of each model is gradually decreas-
500 0.00818345 0.0106546 0.014983
ing. The LSTM model has multiple fluctuations during
the training process. DRNN and Associated Net are very
stable. Moreover, the individual sub-loss of the associated
Through the normalization operation, the data is scaled network model is also gradually decreased. As shown in
to [0, 1], which not only speeds up the gradient descent to Table 4, although the total loss of Associated Net is higher
find the optimal solution, but also improves the accuracy. than that of the other two models, its sub-loss is very low,
Table 4 The average square loss Times Average square loss of Average square loss of Average square loss of Average loss
of Associated Net open price lowest price highest price of three losses
13
International Journal of Machine Learning and Cybernetics
Fig. 13 Loss of associated net Fig. 15 Loss variation of PetroChina stock training model
and by increasing the number of iterations, the total loss • PetroChina has a large circulation, and the stock price
of Associated Net is gradually reduced. fluctuation is relatively small, so that a good fitting effect
In order to verify the universality of the model, the his- can be obtained quickly. ZTE’s stock price fluctuations
torical data of two stocks of PetroChina and ZTE are used are relatively larger, so that more training data is needed
to verify the universality of the model. The experimental to obtain a good fitting effect.
results are shown in Figs. 14 and 15. Combined Fig. 13 with
Table 5, it is concluded that the model fits PetroChina data 5.3 Experimental analysis in the test phase
better. The data fitting result of ZTE is relatively poor at the
beginning, but it gradually becomes better; In the end, their In order to verify the training of each model in the training
average loss of mean square error became similar. Through phase, the three models were tested separately using a test
the experiments, it is found that the more the training data, set of multiple stocks. The mean square error (MSE) is the
the better the model fitting effect. Further more, while the expected value of the square of the difference between the
number of iterations of the model training was increased estimated value of the parameter and the true value of the
appropriately, and the loss of the model decreased gradually. parameter, MAE is the average of the absolute error, and
The above results are due to the following reasons. MAE can better reflect the real situation of the predicted
value error. Therefor in the test, the average absolute error
• The model is complex and needs large amount of data to of MAE (mean absolute error) (Eq. 12) was used as the
train the parameters of each neuron. evaluation index to calculate the degree of deviation, and
the result of 1 − MAE was used as the average accuracy of
13
International Journal of Machine Learning and Cybernetics
Table 5 Average square loss Stock Open price Lowest price Highest price Average of
of different data sets in the three losses
Associated Net model
PetroChina 0.036184 0.032914 0.034226 0.034444
ZTE 0.032247 0.037305 0.031708 0.033753
Shanghai Index 0.023419 0.021976 0.030776 0.02539
13
International Journal of Machine Learning and Cybernetics
6 Conclusion
Fig. 19 The test result of the PetroChina in associated net model Acknowledgements This work is partially supported by the Science
and Technology Project of Guangxi(Guike AB16380260) and Spe-
cialized Scientific Research in Public Welfare Industry (Meteorology)
(GYHY201406027).
Open Access This article is distributed under the terms of the Crea-
tive Commons Attribution 4.0 International License (http://creativeco
mmons.org/licenses/by/4.0/), which permits unrestricted use, distribu-
tion, and reproduction in any medium, provided you give appropriate
credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
References
1. Xia Y, Liu Y, Chen Z (2013) Support vector regression for predic-
tion of stock trend. 6th Int Conf Inf Manag Innov Manag Ind Eng
(ICIII) 2:123–126
2. Sands TM, Tayal D, Morris ME, Monteiro ST (2015) Robust
stock value prediction using support vector machines with parti-
cle swarm optimization. Congress on Evolutionary Computation
(CEC), pp 3327–3331
Fig. 20 The test result of the ZTE in associated net model
13
International Journal of Machine Learning and Cybernetics
3. Li J, Bu H, Wu J (2017) Sentiment-aware stock market prediction: integrated system based on artificial bee colony algorithm. Appl
a deep learning method. In: IEEE: 2017 international conference Soft Comput J 2(11):2510–2525
on service systems and service management, pp 1–6 11. Zhuge Q, Xu L, Zhang G (2017) LSTM neural network with
4. Yang B, Gong Z-J, Yang W (2017) Stock market index prediction emotional analysis for prediction of stock price. Eng Lett
using deep neural network ensemble. In: 36th Chinese Control 2(25):167–175
Conference (CCC), pp 3882–3887 12. Jia H (2016) Investigation into the effectiveness of long short
5. Tsai Y-C, Hong C-Y (2017) The application of evolutionary term memory networks for stock price prediction. arXiv [cs.NE],
approach for stock trend awareness. In: IEEE: IEEE 8th interna- pp 1–6
tional conference on awareness science and technology (iCAST), 13. Li Z, Tam V (2017) Combining the real-time wavelet denoising
pp 306–311 and long- short-term-memory neural network for predicting stock
6. Li X, Yang L, Xue F, Zhou H (2017) Time series prediction of indexes. In: IEEE: 2017 IEEE symposium series on computational
stock price using deep belief networks with Intrinsic plasticity. intelligence (SSCI), pp 1–8
In: IEEE: 2017 29th Chinese Control and Decision Conference 14. Pascual S, Bonafonte A (2016) Multi-output RNN-LSTM for
(CCDC), pp 1237–1242 multiple speaker speech synthesis and adaptation. In: IEEE:
7. Gudelek MU, Boluk SA, Ozbayoglu M (2017) A deep learning 2016 24th European Signal Processing Conference (EUSIPCO),
based stock trading model with 2-D CNN trend detection. In: pp 2325–2329
IEEE: 2017 IEEE symposium series on computational intelligence 15. Ahmed Raza SE, Cheung L, Epstein D et al (2017) MIMO-NET:
(SSCI), pp 1–8 a multi-input multi-output convolutional neural network for cell
8. Adebiyi AA, Ayo KC, Adebiyi MO, Otokiti SO (2012) Stock price segmentation in fluorescence microscopy images. In: IEEE: 2017
prediction using neural network with hybridized market indica- IEEE 14th international symposium on biomedical imaging (ISBI
tors. J Emerg Trends Comput Inf Sci 1(3):1–9 2017), pp 337–340
9. Billah M, Waheed S, Hanifa A (2015) Predicting closing stock
price using artificial neural network and adaptive neuro fuzzy Publisher’s Note Springer Nature remains neutral with regard to
inference system: the case of the dhaka exchange. Int J Comput jurisdictional claims in published maps and institutional affiliations.
Appl 11(129):975–8887
10. Hsieh TJ, Hsiao HF, Yeh WC (2011) Forecasting stock mar-
kets using wavelet transforms and recurrent neural networks: an
13