Stock Market Prediction Using Machine Learning Algorithms
Stock Market Prediction Using Machine Learning Algorithms
Stock Market Prediction Using Machine Learning Algorithms
Abstract: The main objective of this paper is to find the best suited to real-world settings. The system is also expected to
model to predict the value of the stock market. During the process take into account all the variables that might affect the
Of considering various techniques and variables that must be stock's value and performance. There are various methods
taken into account, we found out that techniques like random and ways of implementing the prediction system like
forest, support vector machine were not exploited fully. In, this
Fundamental Analysis, TechnicalAnalysis, Machine
paper we are going to present and review a more feasible method
to predict the stock movement with higher accuracy. The first Learning, Market Mimicry, and Time series aspect
thing we have taken into account is the dataset of the stock structuring. With the advancement of the digital era, the
market prices from previous year. The dataset was pre-processed prediction has moved up into the technological realm. The
and tuned up for real analysis. Hence, our paper will also focus most prominent and [3] promising technique involves the
on data preprocessing of the raw dataset. Secondly, after pre- use of Artificial Neural Networks, Recurrent Neural
processing the data, we will review the use of random forest, Networks, that is basically the implementation of machine
support vector machine on the dataset and the outcomes it learning. Machine learning involves artificial intelligence
generates. In addition, the proposed paper examines the use of which empowers the system to learn and improve from past
the prediction system in real-world settings and issues associated
experiences without being programmed time and again.
with the accuracy of the overall values given. The paper also
presents a machine-learning model to predict the longevity of Traditional methods of prediction in machine learning use
stock in a competitive market. The successful prediction of the algorithms like Backward Propagation, also known as
stock will be a great asset for the stock market institutions and Backpropagation errors. Lately, many researchers are using
will provide real-life solutions to the problems that stock investors more of ensemble learning techniques. It would use low
face. price and time [3] lags to predict future highs while another
Keywords: Machine Learning, Data Pre-processing, Data network would use lagged highs to predict future highs.
Mining, Dataset, Stock, Stock Market. These predictions were used to form stock prices. [1]
Stock market price prediction for short time windows
I. INTRODUCTION appears to be a random process. The stock price movement
over a long period of time usually develops a linear curve.
The stock market is basically an aggregation of various People tend to buy those stocks whose prices are expected to
buyers and sellers of stock. A stock (also known as shares rise in the near future. The uncertainty in the stock market
more commonly) in general represents ownership claims on refrain people from investing in stocks. Thus, there is a need
business by a particular individual or a group of people. The to accurately predict the stock market which can be used in a
attempt [3] to determine the future value of the stock market real-life scenario. The methods used to predict the stock
is known as a stock market prediction. The prediction is market includes a time series forecasting along with
expected to be robust, accurate and efficient. The system technical analysis, machine learning modeling and
must work according to the real-life scenarios and should be predicting the variable stock market. The datasets of the
well stock market prediction model include details like the
closing price opening price, the data and various other
variables that are needed to predict the object variable which
is the price in a given day. The previous model used
traditional methods of prediction like multivariate analysis
with a prediction time series model. Stock market prediction
outperforms when it is treated as a regression problem but
performs well when treated as a classification. The aim is to
design a model that gains from the market information
Revised Manuscript Received on April 25, 2019. utilizing machine learning strategies and gauge the future
Mrs. K. Hiba Sadia, Asst. Prof., Computer Science
and Engineering, SRM Institute of Science and Technology, patterns in stock value development. The Support Vector
Ramapuram, Chennai, India. Machine (SVM) can be used for both classification and
Aditya Sharma, B.Tech Student, Computer Science and Engineering regression. It has been observed that SVMs are more used in
SRM Institute of Science and Technology, Ramapuram, classification based problem like ours. The SVM technique,
Chennai, India.
Adarrsh Paul , B.Tech Student, Computer Science and Engineering we plot every single data component as a point in n-
SRM Institute of Science and Technology, Ramapuram, dimensional space (where n is the number of features of the
Chennai, India. dataset available) with the value of feature being the value
SarmisthaPadhi, B.Tech Student, Computer Science and Engineering of a particular coordinate and, hence classification is
SRM Institute of Science and Technology, Ramapuram,
Chennai, India. performed by finding the
Saurav Sanyal, B.Tech Student, Computer Science and Engineering hyperplane that differentiates
SRM Institute of Science and Technology, Ramapuram, the two classes explicitly.
Chennai, India.
Published By:
Retrieval Number: D6321048419/19©BEIESP
25 Blue Eyes Intelligence Engineering &
Sciences Publication
Stock Market Prediction Using Machine Learning Algorithms
Predictive methods like Random forest technique are used fitted in other ways, such as by diminishing the "lack of fit"
for the same.The random forest algorithm follows an in some other norm, or by diminishing a handicapped
ensemble learning strategy for classification and version of the least squares loss function. Conversely, the
regression.The random forest takes the average of the least squares approach can be utilized to fit nonlinear
various subsamples of the dataset, this increases the models. [1]
predictive accuracy and reduces the over-fitting of the
dataset. 2.Impact of Financial Ratios and Technical Analysis on
Stock Price Prediction Using Random Forests
II. PROBLEM DEFINITION
The use of machine learning and artificial intelligence
Stock market prediction is basically defined as trying to techniques to predict the prices of the stock is an increasing
determine the stock value and offer a robust idea for the trend. More and more researchers invest their time every
people to know and predict the market and the stock prices. day in coming up with ways to arrive at techniques that can
It is generally presented using the quarterly financial ratio further improve the accuracy of the stock prediction model.
using the dataset. Thus, relying on a single dataset may not Due to the vast number of options available, there can be n
be sufficient for the prediction and can give a result which is number of ways on how to predict the price of the stock, but
inaccurate. Hence, we are contemplating towards the study all methods don‟t work the same way. The output varies for
of machine learning with various datasets integration to each technique even if the same data set is being applied. In
predict the market and the stock trends. the cited paper the stock price prediction has been carried
The problem with estimating the stock price will remain a out by using the random forest algorithm is being used to
problem if a better stock market prediction algorithm is not predict the price of the stock using financial ratios form the
proposed. Predicting how the stock market will perform is previous quarter. This is just one wayof looking at the
quite difficult. The movement in the stock market is usually problem by approaching it using a predictive model, using
determined by the sentiments of thousands of investors. the random forest to predict the future price of the stock
Stock market prediction, calls for an ability to predict the from historical data. However, there are always other factors
effect of recent events on the investors. These events can be that influence the price of the stock, such as sentiments of
political events like a statement by a political leader, a piece the investor, public opinion about the company, news from
of news on scam etc. It can also be an international event various outlets, and even events that cause the entire stock
like sharp movements in currencies and commodity etc. All market to fluctuate. By using the financial ratio along with a
these events affect the corporate earnings, which in turn model that can effectively analyze sentiments the accuracy
affects the sentiment of investors. It is beyond the scope of of the stock price prediction model can be increased. [2]
almost all investors to correctly and consistently predict
these hyperparameters. All these factors make stock price 3.Stock Market Prediction via Multi-Source Multiple
prediction very difficult. Once the right data is collected, it Instance Learning
then can be used to train a machine and to generate a
predictive result. Accurately predicting the stock market is a challenging task,
but the modern web has proved to be a very useful tool in
III. LITERATURE SURVEY making this task easier. Due to the interconnected format of
data, it is easy to extract certain sentiments thus making it
easier to establish relationships between various variable
During a literature survey, we collected some of the and roughly scope out a pattern of investment. Investment
information about Stock market prediction mechanisms pattern from various firms show sign of similarity, and the
currently being used. key to successfully predicting the stock market is to exploit
these same consistencies between the data sets. The way
1.Survey of Stock Market Prediction Using Machine stock market information can be predicted successfully is by
Learning Approach using more than just technical historical data, and using
other methods like the use of sentiment analyzer to derive an
The stock market prediction has become an increasingly important connection between people‟s emotions and how
important issue in the present time. One of the methods they are influenced by investment in specific stocks. One
employed is technical analysis, but such methods do not more important segment of the prediction process was the
always yield accurate results. So it is important to develop extraction of important events from web news to see how it
methods for a more accurate prediction. Generally, affected stock prices. [3]
investments are made using predictions that are obtained
from the stock price after considering all the factors that 4. Stock Market Prediction: Using Historical Data
might affect it. The technique that was employed in this Analysis
instance was a regression. Since financial stock marks
generate enormous amounts of data at any given time a great The stock market prediction process is filled with
volume of data needs to undergo analysis before a prediction uncertainty and can be influenced by multiple factors.
can be made. Each of the techniques listed under regression Therefore, the stock market plays an important role in
hasits own advantages and limitations over its other business and finance. The technical and fundamental
counterparts. One of the noteworthy techniques that were analysis is done by sentimental
mentioned was linear regression. The way linear regression analysis process. Social media
models work is that they are often fitted using the least data has a high impact due to its
squares approach, but they may alternatively be also be increased usage, and it can [6]
Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
26
Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019
The time series prediction problem was researched in the The existing system fails when there are rare
work centers in the various financial institution. The outcomes or predictors, as the algorithm is based
prediction model, which is based on SVM and independent on bootstrap sampling.
analysis, combined called SVM-ICA, is proposed for stock The previous results indicate that the stock price is
market prediction. Various time series analysis models are unpredictable when the traditional classifier is
based on machine learning. The SVM is designed to solve used.
regression problems in non-linear classification and time The existence system reported highly predictive
series analysis. The generalization error is minimized using values, by selecting an appropriate time period for
an approximate function, which is based on risk diminishing
their experiment to obtain highly predictive scores.
principle. Thus, the ICA technique extracts various
important features from the dataset. The time series The existing system does not perform well when
prediction is based on SVM. The result of the SVM model there is a change in the operating environment.
was compared with the results of the ICA technique without It doesn‟t focus on external events in the
using a preprocessing step. environment, like news events or social media.
It exploits only one data
8. Machine Learning Approach In Stock Market source, thus highly
Prediction biased.
Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
27 Sciences Publication
Stock Market Prediction Using Machine Learning Algorithms
The existing system needs some form of input decision trees, and other hyper parameters like oob-
interpretation, thus need of scaling. score to determine the generalization accuracy of the
It doesn‟t exploit data pre-processing techniques to random forest, max_features which includes the
remove inconsistency and incompleteness of the number of features for best-split.
min_weight_fraction_leaf is the minimum weighted
data.
fraction of the sum total of weights of all the input
samples required to be at a leaf node. Samples have
V. PROPOSED SYSTEM equal weight when sample weight is not provided.
SVM classifier
SVM classifier is a type of discriminative classifier.
In this proposed system, we focus on predicting the stock
The SVM uses supervised learning i.e. a labeled
values using machine learning algorithms like Random
training data. The output are hyperplanes which
Forest and Support Vector Machines. We proposed the
categorizes the new dataset. They are supervised
system “Stock market price prediction” we have predicted
learning models that uses associated learning
the stock market price using the random forest algorithm. In
algorithm for classification and as well as regression.
this proposed system, we were able to train the machine
Parameters
from the various data points from the past to make a future
The tuning parameters of SVM classifier are kernel
prediction. We took data from the previous year stocks to
parameter, gamma parameter and regularization
train the model. We majorly used two machine-learning
parameter.
libraries to solve the problem. The first one was numpy,
which was used to clean and manipulate the data, and Kernels can be categorized as linear and
polynomial kernels calculates the prediction line.
getting it into a form ready for analysis. The other was
In linear kernels prediction for a new input is
scikit, which was used for real analysis and prediction. The
calculated by the dot product between the input
data set we used was from the previous years stock markets
collected from the public database available online, 80 % of and the support vector.
data was used to train the machine and the rest 20 % to test C parameter is known as the regularization
the data. The basic approach of the supervised learning parameter; it determines whether the accuracy of
model is to learn the patterns and relationships in the data model is increases or decreases. The default
from the training set and then reproduce them for the test value of c=10.Lower regularization value leads
data. We used the python pandas library for data processing to misclassification.
which combined different datasets into a data frame. The Gamma parameter measures the influence of a
tuned up dataframe allowed us to prepare the data for single training on the model. Low values
feature extraction. The dataframe features were date and the signifies far from the plausible margin and high
closing price for a particular day. We used all these features values signifies closeness from the plausible
to train the machine on random forest model and predicted margin.
the object variable, which is the price for a given day. We
also quantified the accuracy by using the predictions for the 2. Random Forest Algorithm
test set and the actual values. The proposed system touches
different areas of research including data pre-processing, Random forest algorithm is being used for the stock
random forest, and so on. market prediction. Since it has been termed as one of the
easiest to use and flexible machine learning algorithm, it
VI. METHODOLOGIES gives good accuracy in the prediction. This is usually used
in the classification tasks. Because of the high volatility in
1. Classification the stock market, the task of predicting is quite challenging.
Classification is an instance of supervised learning In stock market prediction we are using random forest
where a set is analyzed and categorized based on a classifier which has the same hyperparameters as of a
common attribute. From the values or the data are decision tree.The decision tool has a model similar to that of
given, classification draws some conclusion from the a tree. It takes the decision based on possible consequences,
observed value. If more than one input is given then which includes variables like event outcome, resource cost,
classification will try to predict one or more and utility. The random forest algorithm represents an
outcomes for the same. A few classifiers that are used algorithm where it randomly selects different observations
here for the stock market prediction includes the and features to build several decisiontree and then takes the
random forest classifier, SVM classifier. aggregate of the several decision trees outcomes. The data is
Random Forest Classifier split into partitions based on the questions on a label or an
Random forest classifier is a type of ensemble attribute. The data set we used was from the previous year‟s
classifier and also a supervised algorithm. It basically stock markets collected from the public database available
creates a set of decision trees, that yields some result. online, 80 % of data was used to train the machine and the
The basic approach of random class classifier is to rest 20 % to test the data. The basic approach of the
take the decisionaggregate of random subset decision supervised learning model is to learn the patterns and
tress and yield a final class or result based on the relationships in the data from the training set and then
votes of the random subset of decision trees. reproduce them for the test data.
Parameters
The parameters included in the random forest 3. Support Vector
classifier are n_estimators which is total number of Machine Algorithm
Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
28
Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019
Kaggle is an online community for data analysis and Fig 1 System Architecture
predictive modeling. It also contains dataset of different
fields, which is contributed by data miners. Various data VIII. MODULE IDENTIFICATION
scientist competes to create the best models for predicting
and depicting the information. It allows the users to use their The various modules of the project would be divided into
datasets so that they can build models and work with various the segments as described.
data science engineers to solve various real-life data science
challenges. The dataset used in the proposed project has I. Data Collection
been downloaded from Kaggle. However, this data set is
present in what we call raw format. The data set is a Data collection is a very basic module and the initial step
collection of stock market information about a few towards the project. It generally deals with the collection of
companies. the right dataset. The dataset that is to be used in the market
The first step is the conversion of this raw data into prediction has to be used to be filtered based on various
processed data. This is done using feature extraction, since aspects. Data collection also complements to enhance the
in the raw data collected there are multiple attributes but dataset by adding more data that are external. Our data
only a few of those attributes are useful for the purpose of mainly consists of the previous year stock prices. Initially,
prediction. So the first step is feature extraction, where the we will be analyzing the Kaggle dataset and according to the
key attributes are extracted from the whole list of attributes accuracy, we will be using the model with the data to
available in the raw dataset. Feature extraction starts from analyze the predictions accurately.
an initial state of measured data and builds derived values or
features. These features are intended to be informative and II. Pre Processing
non-redundant, facilitating the subsequent learning and Data pre-processing is a part of data mining, which involves
generalization steps. Feature extraction is a dimensionality
transforming raw data into a more coherent format. Raw
reduction process, where the initial set of raw variables is
data is usually, inconsistent or incomplete and usually
diminished to progressively reasonable features for ease of
management, while still precisely and totally depicting the contains many errors. The data pre-processing involves
first informational collection. checking out for missing values, looking for categorical
The feature extraction process is followed by a classification values, splitting the data-set into training and test set and
process wherein the data that was obtained after feature finally do a feature scaling to limit the range of variables so
extraction is split into two different and distinct segments. that they can be compared on common environs.
Classification is the issue of recognizing to which set
of categories a new observation belongs. The training data III. Training the Machine
set is used to train the model whereas the test data is used to
predict the accuracy of the model. The splitting is done in a Training the machine is similar to feeding the data to the
way that training data maintain a higher proportion than the algorithm to touch up the test data. Thetraining sets are used
test data. to tune and fit the models. The test sets are untouched, as a
The random forest algorithm utilizes a collection of random model should not be judged based on unseen data. The
decision trees to analyze the data. In layman terms, from the training of the model includes cross-validation where we get
total number of decision trees in the forest, a cluster of the a well-grounded approximate
decisiontrees look for specific attributes in the data. This is performance of the model using
known as data splitting. In this case, since the end goal of the training data. Tuning models
are meant to specifically tune the
Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
29 Sciences Publication
Stock Market Prediction Using Machine Learning Algorithms
Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
30
Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-4, April 2019
X. FUTURE ENHANCEMENT
IX. CONCLUSION
Published By:
Retrieval Number: D6321048419/19©BEIESP Blue Eyes Intelligence Engineering &
31 Sciences Publication