Stock Market Prediction Using Machine Learning Algorithms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

International Journal of Engineering and Advanced Technology (IJEAT)

ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019

Stock Market Prediction Using Machine


Learning Algorithms
K. Hiba Sadia, Aditya Sharma, Adarrsh Paul, SarmisthaPadhi, Saurav Sanyal

Abstract: The main objective of this paper is to find the best suited to real-world settings. The system is also expected to
model to predict the value of the stock market. During the process take into account all the variables that might affect the
Of considering various techniques and variables that must be stock's value and performance. There are various methods
taken into account, we found out that techniques like random and ways of implementing the prediction system like
forest, support vector machine were not exploited fully. In, this
Fundamental Analysis, TechnicalAnalysis, Machine
paper we are going to present and review a more feasible method
to predict the stock movement with higher accuracy. The first Learning, Market Mimicry, and Time series aspect
thing we have taken into account is the dataset of the stock structuring. With the advancement of the digital era, the
market prices from previous year. The dataset was pre-processed prediction has moved up into the technological realm. The
and tuned up for real analysis. Hence, our paper will also focus most prominent and [3] promising technique involves the
on data preprocessing of the raw dataset. Secondly, after pre- use of Artificial Neural Networks, Recurrent Neural
processing the data, we will review the use of random forest, Networks, that is basically the implementation of machine
support vector machine on the dataset and the outcomes it learning. Machine learning involves artificial intelligence
generates. In addition, the proposed paper examines the use of which empowers the system to learn and improve from past
the prediction system in real-world settings and issues associated
experiences without being programmed time and again.
with the accuracy of the overall values given. The paper also
presents a machine-learning model to predict the longevity of Traditional methods of prediction in machine learning use
stock in a competitive market. The successful prediction of the algorithms like Backward Propagation, also known as
stock will be a great asset for the stock market institutions and Backpropagation errors. Lately, many researchers are using
will provide real-life solutions to the problems that stock investors more of ensemble learning techniques. It would use low
face. price and time [3] lags to predict future highs while another
Keywords: Machine Learning, Data Pre-processing, Data network would use lagged highs to predict future highs.
Mining, Dataset, Stock, Stock Market. These predictions were used to form stock prices. [1]
Stock market price prediction for short time windows
I. INTRODUCTION appears to be a random process. The stock price movement
over a long period of time usually develops a linear curve.
The stock market is basically an aggregation of various People tend to buy those stocks whose prices are expected to
buyers and sellers of stock. A stock (also known as shares rise in the near future. The uncertainty in the stock market
more commonly) in general represents ownership claims on refrain people from investing in stocks. Thus, there is a need
business by a particular individual or a group of people. The to accurately predict the stock market which can be used in a
attempt [3] to determine the future value of the stock market real-life scenario. The methods used to predict the stock
is known as a stock market prediction. The prediction is market includes a time series forecasting along with
expected to be robust, accurate and efficient. The system technical analysis, machine learning modeling and
must work according to the real-life scenarios and should be predicting the variable stock market. The datasets of the
well stock market prediction model include details like the
closing price opening price, the data and various other
variables that are needed to predict the object variable which
is the price in a given day. The previous model used
traditional methods of prediction like multivariate analysis
with a prediction time series model. Stock market prediction
outperforms when it is treated as a regression problem but
performs well when treated as a classification. The aim is to
design a model that gains from the market information
Revised Manuscript Received on April 25, 2019. utilizing machine learning strategies and gauge the future
Mrs. K. Hiba Sadia, Asst. Prof., Computer Science
and Engineering, SRM Institute of Science and Technology, patterns in stock value development. The Support Vector
Ramapuram, Chennai, India. Machine (SVM) can be used for both classification and
Aditya Sharma, B.Tech Student, Computer Science and Engineering regression. It has been observed that SVMs are more used in
SRM Institute of Science and Technology, Ramapuram, classification based problem like ours. The SVM technique,
Chennai, India.
Adarrsh Paul , B.Tech Student, Computer Science and Engineering we plot every single data component as a point in n-
SRM Institute of Science and Technology, Ramapuram, dimensional space (where n is the number of features of the
Chennai, India. dataset available) with the value of feature being the value
SarmisthaPadhi, B.Tech Student, Computer Science and Engineering of a particular coordinate and, hence classification is
SRM Institute of Science and Technology, Ramapuram,
Chennai, India. performed by finding the
Saurav Sanyal, B.Tech Student, Computer Science and Engineering hyperplane that differentiates
SRM Institute of Science and Technology, Ramapuram, the two classes explicitly.
Chennai, India.

Published By:
Retrieval Number: D6321048419/19©BEIESP
25 Blue Eyes Intelligence Engineering &
Sciences Publication
Stock Market Prediction Using Machine Learning Algorithms

Predictive methods like Random forest technique are used fitted in other ways, such as by diminishing the "lack of fit"
for the same.The random forest algorithm follows an in some other norm, or by diminishing a handicapped
ensemble learning strategy for classification and version of the least squares loss function. Conversely, the
regression.The random forest takes the average of the least squares approach can be utilized to fit nonlinear
various subsamples of the dataset, this increases the models. [1]
predictive accuracy and reduces the over-fitting of the
dataset. 2.Impact of Financial Ratios and Technical Analysis on
Stock Price Prediction Using Random Forests
II. PROBLEM DEFINITION
The use of machine learning and artificial intelligence
Stock market prediction is basically defined as trying to techniques to predict the prices of the stock is an increasing
determine the stock value and offer a robust idea for the trend. More and more researchers invest their time every
people to know and predict the market and the stock prices. day in coming up with ways to arrive at techniques that can
It is generally presented using the quarterly financial ratio further improve the accuracy of the stock prediction model.
using the dataset. Thus, relying on a single dataset may not Due to the vast number of options available, there can be n
be sufficient for the prediction and can give a result which is number of ways on how to predict the price of the stock, but
inaccurate. Hence, we are contemplating towards the study all methods don‟t work the same way. The output varies for
of machine learning with various datasets integration to each technique even if the same data set is being applied. In
predict the market and the stock trends. the cited paper the stock price prediction has been carried
The problem with estimating the stock price will remain a out by using the random forest algorithm is being used to
problem if a better stock market prediction algorithm is not predict the price of the stock using financial ratios form the
proposed. Predicting how the stock market will perform is previous quarter. This is just one wayof looking at the
quite difficult. The movement in the stock market is usually problem by approaching it using a predictive model, using
determined by the sentiments of thousands of investors. the random forest to predict the future price of the stock
Stock market prediction, calls for an ability to predict the from historical data. However, there are always other factors
effect of recent events on the investors. These events can be that influence the price of the stock, such as sentiments of
political events like a statement by a political leader, a piece the investor, public opinion about the company, news from
of news on scam etc. It can also be an international event various outlets, and even events that cause the entire stock
like sharp movements in currencies and commodity etc. All market to fluctuate. By using the financial ratio along with a
these events affect the corporate earnings, which in turn model that can effectively analyze sentiments the accuracy
affects the sentiment of investors. It is beyond the scope of of the stock price prediction model can be increased. [2]
almost all investors to correctly and consistently predict
these hyperparameters. All these factors make stock price 3.Stock Market Prediction via Multi-Source Multiple
prediction very difficult. Once the right data is collected, it Instance Learning
then can be used to train a machine and to generate a
predictive result. Accurately predicting the stock market is a challenging task,
but the modern web has proved to be a very useful tool in
III. LITERATURE SURVEY making this task easier. Due to the interconnected format of
data, it is easy to extract certain sentiments thus making it
easier to establish relationships between various variable
During a literature survey, we collected some of the and roughly scope out a pattern of investment. Investment
information about Stock market prediction mechanisms pattern from various firms show sign of similarity, and the
currently being used. key to successfully predicting the stock market is to exploit
these same consistencies between the data sets. The way
1.Survey of Stock Market Prediction Using Machine stock market information can be predicted successfully is by
Learning Approach using more than just technical historical data, and using
other methods like the use of sentiment analyzer to derive an
The stock market prediction has become an increasingly important connection between people‟s emotions and how
important issue in the present time. One of the methods they are influenced by investment in specific stocks. One
employed is technical analysis, but such methods do not more important segment of the prediction process was the
always yield accurate results. So it is important to develop extraction of important events from web news to see how it
methods for a more accurate prediction. Generally, affected stock prices. [3]
investments are made using predictions that are obtained
from the stock price after considering all the factors that 4. Stock Market Prediction: Using Historical Data
might affect it. The technique that was employed in this Analysis
instance was a regression. Since financial stock marks
generate enormous amounts of data at any given time a great The stock market prediction process is filled with
volume of data needs to undergo analysis before a prediction uncertainty and can be influenced by multiple factors.
can be made. Each of the techniques listed under regression Therefore, the stock market plays an important role in
hasits own advantages and limitations over its other business and finance. The technical and fundamental
counterparts. One of the noteworthy techniques that were analysis is done by sentimental
mentioned was linear regression. The way linear regression analysis process. Social media
models work is that they are often fitted using the least data has a high impact due to its
squares approach, but they may alternatively be also be increased usage, and it can [6]

Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
26
Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019

be helpful in predicting the trend of the stock market.


Technical analysis is done [6] using by applying machine The vast majority of the stockbrokers while making the
learning algorithms on historical data of stock prices. The prediction utilized the specialized, fundamental or the time
method usually involves gathering various social media series analysis. Overall, these techniques couldn't be trusted
data, news to extract sentiments expressed by individuals. completely, so there emerged the need to give a strong
Other data like previous year stock prices are also strategy to financial exchange prediction. To find the best
considered. The relationship between various data points is accurate result, the methodology chose to be implemented as
considered, and a prediction is made on these data points. machine learning and AI along with supervised classifier.
The model was able to make predictions about future stock Results were tried on the binary classification utilizing SVM
values. classifier with an alternate set of a feature list. The greater
part of the Machine Learning approach for taking care of
5. A Survey on Stock Market Prediction Using SVM business [2] issues had their benefit over factual techniques
that did exclude AI, despite the fact that there was an ideal
The recent studies provide a well-grounded proof that most procedure for specific issues. Swarm Intelligence [2]
of the predictive regression models are inefficient in out of optimization method named Cuckoo search was most easy
sample predictability test. The reason for this inefficiency to accommodate the parameters of SVM. The proposed
was parameter instability and model uncertainty. The studies hybrid CS-SVM strategy exhibited the performance to
also concluded the traditional strategies that promise to create increasingly exact outcomes in contrast with ANN.
solve this problem. Support vector machine commonly Likewise, the CS-SVM display [2] performed better in the
known as SVM provides with the kernel, decision function, forecasting of the stock value prediction. Prediction stock
and sparsity of the solution. It is used to learn polynomial cost utilized parse records to compute the predicted, send it
radial basis function and the multi-layer perceptron to the user, and autonomously perform tasks like buying and
classifier. It is a training algorithm for classification and selling shares utilizing automation concept. Naïve Bayes
regression, which works on a larger dataset. There are many Algorithm was utilized. [8]
algorithms in the market but SVM provides with better
efficiency and accuracy. The correlation analysis between 9. Corporate Communication Network and Stock Price
SVM and stock market indicates strong interconnection Movements: Insights from Data Mining
between the stock prices and the market index.
This paper tries to indicate that communication patterns can
6. Predicting Stock Price Direction Using Support
have a very significant effect on an organization‟s
Vector Machines
performance. This paper proposed a technique to reveal the
performance of a company. The technique deployed in the
Financial organizations and merchants have made different paper is used to find the relationships between the
exclusive models to attempt and beat the market for frequencies of email exchange of the key employees and the
themselves or their customers, yet once in a while has performance of the company reflected in stock values. In
anybody accomplished reliably higher-than-normal degrees order to detect association and non-association relationships,
of profitability. Nevertheless, the challenge of stock this paper proposed to use a data mining algorithm on a
forecasting is so engaging in light of the fact that the publicly available dataset of Enron Corp. The Enron
improvement of only a couple of rate focuses can build Corporation was an energy, commodities, and services
benefit by a large number of dollars for these organizations. company based in Houston, Texas whose stock dataset is
[6] available for public use. [9]
7. A Stock Market Prediction Method Based on Support
Vector Machines (SVM) and Independent Component IV. DISADVANTAGES OF THE EXISTING
Analysis (ICA) SYSTEM

The time series prediction problem was researched in the  The existing system fails when there are rare
work centers in the various financial institution. The outcomes or predictors, as the algorithm is based
prediction model, which is based on SVM and independent on bootstrap sampling.
analysis, combined called SVM-ICA, is proposed for stock  The previous results indicate that the stock price is
market prediction. Various time series analysis models are unpredictable when the traditional classifier is
based on machine learning. The SVM is designed to solve used.
regression problems in non-linear classification and time  The existence system reported highly predictive
series analysis. The generalization error is minimized using values, by selecting an appropriate time period for
an approximate function, which is based on risk diminishing
their experiment to obtain highly predictive scores.
principle. Thus, the ICA technique extracts various
important features from the dataset. The time series  The existing system does not perform well when
prediction is based on SVM. The result of the SVM model there is a change in the operating environment.
was compared with the results of the ICA technique without  It doesn‟t focus on external events in the
using a preprocessing step. environment, like news events or social media.
 It exploits only one data
8. Machine Learning Approach In Stock Market source, thus highly
Prediction biased.

Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
27 Sciences Publication
Stock Market Prediction Using Machine Learning Algorithms

 The existing system needs some form of input decision trees, and other hyper parameters like oob-
interpretation, thus need of scaling. score to determine the generalization accuracy of the
 It doesn‟t exploit data pre-processing techniques to random forest, max_features which includes the
remove inconsistency and incompleteness of the number of features for best-split.
min_weight_fraction_leaf is the minimum weighted
data.
fraction of the sum total of weights of all the input
samples required to be at a leaf node. Samples have
V. PROPOSED SYSTEM equal weight when sample weight is not provided.
SVM classifier
SVM classifier is a type of discriminative classifier.
In this proposed system, we focus on predicting the stock
The SVM uses supervised learning i.e. a labeled
values using machine learning algorithms like Random
training data. The output are hyperplanes which
Forest and Support Vector Machines. We proposed the
categorizes the new dataset. They are supervised
system “Stock market price prediction” we have predicted
learning models that uses associated learning
the stock market price using the random forest algorithm. In
algorithm for classification and as well as regression.
this proposed system, we were able to train the machine
Parameters
from the various data points from the past to make a future
The tuning parameters of SVM classifier are kernel
prediction. We took data from the previous year stocks to
parameter, gamma parameter and regularization
train the model. We majorly used two machine-learning
parameter.
libraries to solve the problem. The first one was numpy,
which was used to clean and manipulate the data, and  Kernels can be categorized as linear and
polynomial kernels calculates the prediction line.
getting it into a form ready for analysis. The other was
In linear kernels prediction for a new input is
scikit, which was used for real analysis and prediction. The
calculated by the dot product between the input
data set we used was from the previous years stock markets
collected from the public database available online, 80 % of and the support vector.
data was used to train the machine and the rest 20 % to test  C parameter is known as the regularization
the data. The basic approach of the supervised learning parameter; it determines whether the accuracy of
model is to learn the patterns and relationships in the data model is increases or decreases. The default
from the training set and then reproduce them for the test value of c=10.Lower regularization value leads
data. We used the python pandas library for data processing to misclassification.
which combined different datasets into a data frame. The  Gamma parameter measures the influence of a
tuned up dataframe allowed us to prepare the data for single training on the model. Low values
feature extraction. The dataframe features were date and the signifies far from the plausible margin and high
closing price for a particular day. We used all these features values signifies closeness from the plausible
to train the machine on random forest model and predicted margin.
the object variable, which is the price for a given day. We
also quantified the accuracy by using the predictions for the 2. Random Forest Algorithm
test set and the actual values. The proposed system touches
different areas of research including data pre-processing, Random forest algorithm is being used for the stock
random forest, and so on. market prediction. Since it has been termed as one of the
easiest to use and flexible machine learning algorithm, it
VI. METHODOLOGIES gives good accuracy in the prediction. This is usually used
in the classification tasks. Because of the high volatility in
1. Classification the stock market, the task of predicting is quite challenging.
Classification is an instance of supervised learning In stock market prediction we are using random forest
where a set is analyzed and categorized based on a classifier which has the same hyperparameters as of a
common attribute. From the values or the data are decision tree.The decision tool has a model similar to that of
given, classification draws some conclusion from the a tree. It takes the decision based on possible consequences,
observed value. If more than one input is given then which includes variables like event outcome, resource cost,
classification will try to predict one or more and utility. The random forest algorithm represents an
outcomes for the same. A few classifiers that are used algorithm where it randomly selects different observations
here for the stock market prediction includes the and features to build several decisiontree and then takes the
random forest classifier, SVM classifier. aggregate of the several decision trees outcomes. The data is
Random Forest Classifier split into partitions based on the questions on a label or an
Random forest classifier is a type of ensemble attribute. The data set we used was from the previous year‟s
classifier and also a supervised algorithm. It basically stock markets collected from the public database available
creates a set of decision trees, that yields some result. online, 80 % of data was used to train the machine and the
The basic approach of random class classifier is to rest 20 % to test the data. The basic approach of the
take the decisionaggregate of random subset decision supervised learning model is to learn the patterns and
tress and yield a final class or result based on the relationships in the data from the training set and then
votes of the random subset of decision trees. reproduce them for the test data.
Parameters
The parameters included in the random forest 3. Support Vector
classifier are n_estimators which is total number of Machine Algorithm

Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
28
Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019

our proposed system is to predict the price of the stock by


analyzing its historical data.
The main task of the support machine algorithm is to identify
an N-dimensional space that distinguishably categorizes the
data points. Here, N stands for a number of features.
Between two classes of data points, there can be multiple
possible hyperplanes that can be chosen. The objective of
this algorithm is to find a plane that has maximum margin.
Maximizing margin refers to the distance between data
points of both classes. The benefit associated with
maximizing the margin is that it provides is that it provides
some reinforcement so that future data points can be more
easily classified. Decision boundaries that help classify data
points are called hyperplanes. Based on the position of the
data points relative to the hyperplane they are attributed to
different classes. The dimension of the hyperplane relies on
the number of attributes, if the number of attributes is two
then the hyperplane is just a line, if the number of attributes
is three then the hyperplane is two dimensional.

VII. SYSTEM ARCHITECTURE

Kaggle is an online community for data analysis and Fig 1 System Architecture
predictive modeling. It also contains dataset of different
fields, which is contributed by data miners. Various data VIII. MODULE IDENTIFICATION
scientist competes to create the best models for predicting
and depicting the information. It allows the users to use their The various modules of the project would be divided into
datasets so that they can build models and work with various the segments as described.
data science engineers to solve various real-life data science
challenges. The dataset used in the proposed project has I. Data Collection
been downloaded from Kaggle. However, this data set is
present in what we call raw format. The data set is a Data collection is a very basic module and the initial step
collection of stock market information about a few towards the project. It generally deals with the collection of
companies. the right dataset. The dataset that is to be used in the market
The first step is the conversion of this raw data into prediction has to be used to be filtered based on various
processed data. This is done using feature extraction, since aspects. Data collection also complements to enhance the
in the raw data collected there are multiple attributes but dataset by adding more data that are external. Our data
only a few of those attributes are useful for the purpose of mainly consists of the previous year stock prices. Initially,
prediction. So the first step is feature extraction, where the we will be analyzing the Kaggle dataset and according to the
key attributes are extracted from the whole list of attributes accuracy, we will be using the model with the data to
available in the raw dataset. Feature extraction starts from analyze the predictions accurately.
an initial state of measured data and builds derived values or
features. These features are intended to be informative and II. Pre Processing
non-redundant, facilitating the subsequent learning and Data pre-processing is a part of data mining, which involves
generalization steps. Feature extraction is a dimensionality
transforming raw data into a more coherent format. Raw
reduction process, where the initial set of raw variables is
data is usually, inconsistent or incomplete and usually
diminished to progressively reasonable features for ease of
management, while still precisely and totally depicting the contains many errors. The data pre-processing involves
first informational collection. checking out for missing values, looking for categorical
The feature extraction process is followed by a classification values, splitting the data-set into training and test set and
process wherein the data that was obtained after feature finally do a feature scaling to limit the range of variables so
extraction is split into two different and distinct segments. that they can be compared on common environs.
Classification is the issue of recognizing to which set
of categories a new observation belongs. The training data III. Training the Machine
set is used to train the model whereas the test data is used to
predict the accuracy of the model. The splitting is done in a Training the machine is similar to feeding the data to the
way that training data maintain a higher proportion than the algorithm to touch up the test data. Thetraining sets are used
test data. to tune and fit the models. The test sets are untouched, as a
The random forest algorithm utilizes a collection of random model should not be judged based on unseen data. The
decision trees to analyze the data. In layman terms, from the training of the model includes cross-validation where we get
total number of decision trees in the forest, a cluster of the a well-grounded approximate
decisiontrees look for specific attributes in the data. This is performance of the model using
known as data splitting. In this case, since the end goal of the training data. Tuning models
are meant to specifically tune the

Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
29 Sciences Publication
Stock Market Prediction Using Machine Learning Algorithms

hyperparameters like the number of trees in a random forest.


We perform the entire cross-validation loop on each set of
hyperparameter values.
Finally, we will calculate a cross-validated score, for
individual sets of hyperparameters. Then, we select the best
hyperparameters. The idea behind the training of the model Fig 3 head()
is that we some initial values with the dataset and then
optimize the parameters which we want to in the model. This is the result of using the head(). Since we are using the
This is kept on repetition until we get the optimal values. pandas library to analyse the data, it returns the first five
Thus, we take the predictions from the trained model on the rows. Here five is the default value of the number of rows it
inputs from the test dataset. Hence, it is divided in the ratio returns unless stated otherwise. The trading code in the
of 80:20 where 80% is for the training set and the rest 20% processed data set is not relevant so we use the strip() to
for a testing set of the data. remove it and replace all of the trading codes with a value
„GP‟
IV. Data Scoring

The process of applying a predictive model to a set of data is


referred to as scoring the data. The technique used to
process the dataset is the Random Forest Algorithm.
Random forest involves an ensemble method, which is
usually used, for classification and as well as regression.
Based on the learning models, we achieve interesting
results. The last module thus describes how the result of the
model can help to predict the probability of a stock to rise
and sink based on certain parameters. It also shows the
vulnerabilities of a particular stock or entity. The user Fig 4 Time series plot of GP
authentication system control is implemented to make sure This is a time series plot generated from using the
that only the authorized entities are accessing the results. “matplotlib.pyplot” library. The plot is of the attributes
“CLOSEP” vs “DATE”. This is to show the trend of closing
IX. EXPERIMENTAL RESULTS price of stock as time varies over a span of two years. The
figure provided below is the candle stick plot, which was
The xlxs file contains the raw data based on which we are generated using the library “mpl_finance”. The candle stick
going to publish our findings. There are eleven columns or plot was generated suing the attributes 'DATE', 'OPENP',
eleven attributes that describe the rise and fall in stock 'HIGH', 'LOW','CLOSEP'.
prices. Some of these attributes are (1) HIGH, which
describes the highest value the stock had in previous year.
(2) LOW, is quite the contrary to HIGH and resembles the
lowest value the stock had in previous year (3) OPENP is
the value of the stock at the very beginning of the trading
day, and (4) CLOSEP stands for the price at which the stock
is valued before the trading day closes. There are other
attributes such as YCP, LTP, TRADE, VOLUME and
VALUE, but the above mentioned four play a very crucial
role in our findings. Fig 5 Candlestick plot

Fig 2 Raw Data

This is a pictorial representation of the data present in our


xlxs file. This particular file contains 121608 such records.
There are more than ten different trading codes available in
the dataset and some of the records do not have relevant
information that can help us train the machine, so the logical
step is to process the raw data. Thus we obtain a more
refined dataset which can now be used to train the machine. Fig 6 Histogram of CLOSEP-
OPENP

Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
30
Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019

investing money in the stock market since it is trained on a


huge collection of historical data and has been chosen after
being tested on a sample data.The project demonstrates the
machine learning model to predict the stock value with more
accuracy as compared to previously implemented machine
learning models.

X. FUTURE ENHANCEMENT

Future scope of this project will involve adding more


parameters and factors like the financial ratios, multiple
instances, etc. The more the parameters are taken into
account more will be the accuracy. The algorithms can also
be applied for analyzing the contents of public comments
and thus determine patterns/relationships between the
Fig 7 Histogram of HIGH-LOW customer and the corporate employee. The use of traditional
algorithms and data mining techniques can also help predict
The above two figures are histograms plotted between the corporation‟s performance structure as a whole.
„CLOSEP‟ and „OPENP‟ and the attributes „HIGH‟ and
„LOW‟. This is done because we believe today‟s closing REFERENCES
price and opening price along with the high and lowest price
of the stock during last year will affect the price of the stock 1. Ashish Sharma, Dinesh Bhuriya, Upendra Singh. "Survey of Stock
at a later date. Based on such reasoning we devised a logic Market Prediction Using Machine Learning Approach", ICECA
“if today‟s CLOSEP is greater than yesterday's CLOSEP 2017.
2. Loke.K.S. “Impact Of Financial Ratios And Technical Analysis On
then we assign the value 1 to DEX or else we assign the Stock Price Prediction Using Random Forests”, IEEE, 2017.
value -1 to DEX. Based on such the whole data set is 3. Xi Zhang1, Siyu Qu1, Jieyun Huang1, Binxing Fang1, Philip Yu2,
processed and upon using the head() we get a glimpse of the “Stock Market Prediction via Multi-Source Multiple Instance
data obtained thus far. Learning.” IEEE 2018.
4. VivekKanade, BhausahebDevikar, SayaliPhadatare, PranaliMunde,
The next step entailed the setting of feature and target ShubhangiSonone. “Stock Market Prediction: Using Historical Data
variable, along with the setting of train size. Using the Analysis”, IJARCSSE 2017.
sklearn libraries we import SVC classifier and fit it with the 5. SachinSampatPatil, Prof. Kailash Patidar, Asst. Prof. Megha Jain, “A
Survey on Stock Market Prediction Using SVM”, IJCTET 2016.
training data. After training the model with the data and 6. https://www.cs.princeton.edu/sites/default/files/uploads/Saahil_magde.
running the test data through the trained model the pdf
confusion matrix obtained is shown below. 7. Hakob GRIGORYAN, “A Stock Market Prediction Method Based
on Support Vector Machines (SVM) and Independent Component
Analysis (ICA)”, DSJ 2016.
8. RautSushrut Deepak, ShindeIshaUday, Dr. D. Malathi, “Machine
Learning Approach In Stock Market
9. Prediction”, IJPAM 2017.
10. Pei-Yuan Zhou , Keith C.C. Chan, Member, IEEE, and Carol
XiaojuanOu, “Corporate Communication Network and Stock Price
Movements: Insights From Data Mining”, IEEE 2018.

Fig 9 Confusion Matrix


Along with this, we use the same dataset to train another
model. This model utilises the Random Forest Classifier
belonging to the ensemble technique. The decision trees
have the default values so that leaves the “n_estimator”
value to be 10 since this is version 0.20. However, the value
of “n_estimator” will change to 100 in the version 0.22.
After fitting the model with the data and running it against
predicted data we find that this has an accuracy score of
0.808.
To sum it up, the accuracy of the SVC Model in Test Set is
0.787 whereas the accuracy score of the random forest
classifier is calculated to 0.808.

IX. CONCLUSION

By measuring the accuracy of the different algorithms, we


found that the most suitable algorithm for predicting the
market price of a stock based on various data points from
the historical data is the random forest algorithm. The
algorithm will be a great asset for brokers and investors for

Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
31 Sciences Publication

You might also like