1822 B.E Ece Batchno 120
1822 B.E Ece Batchno 120
1822 B.E Ece Batchno 120
PREDICTION USING
MACHINE LEARNING
Guide : DR .S . LALITHAKUMARI
Batch members:
38130145 – NACHIAPPAN M
38130138 – MOHAMMED IMRAN
A Project Report on
HOUSE PRICE PREDICTION USING MACHINE LEARNING
Submitted in partial fulfillment of the requirements
for the degree of
Bachelor of Technology
in
Electronics and Communication Engineering
Under the guidance of
DR.S.LALITHAKUMARI
(DEEMED TO BE UNIVERSITY)
2 Proposed System 6
3 Block Diagram 7
5 Alternate Regressor 9
7 Sample Code 12
8 Advantages of LSTM 17
11 Acknowledgement 26
12 Conclusion 27
13 Software Tools 28
14 References 29
AIM & OBJECTIVE
People looking to buy a new home tend to be more conservative with their budgets and
market strategies.
This project aims to analyse various parameters like average income, average area etc.
andpredict the house price accordingly.
To eliminate need of real estate agent to gain information regarding house prices.
The aim is to predict the efficient house pricing for real estate customers with respect
to their budgets and priorities. By analyzing previous market trends and price ranges,
and alsoupcoming developments future prices will be predicted.
House prices increase every year, so there is a need for a system to predict house
prices in the future.
House price prediction can help the developer determine the selling price of a
house and can help the customer to arrange the right time to purchase a house.
We use linear regression algorithm in machine learning for predicting the house price
trends
5
PROPOSED SYSTEM
The values of X and Y variables are training datasets for the model representation of
linear regression. When a user implements a linear regression, algorithms start to find
the best fit line using a0 and a1. In such a way, it becomes more accurate to actual data
points; since we recognize the value of a0 and a1, we can use a model for predicting
the response.
As you can see in the above diagram, the red dots are observed values for both X and
Y.
The black line, which is called a line of best fit, minimizes a sum of a squared error.
The blue lines represent the errors; it is a distance between the line of best fit and
observed values.
The value of the a1is the slope of the black line.
6
BLOCK DIAGRAM
7
Data processing techniques and processes are numerous. We collected data for USA/Mumbai
real estate properties from various real estate websites. The data would be having attributes
such as Location, carpet area, built-up area, age of the property, zip code, price, no of
bedrooms etc. We must collect the quantitative data which is structured and categorized. Data
collection is needed before any kind of machine learning research is carried out. Dataset
validity is a must otherwise there is no point in analyzing the data.
Data preprocessing is the process of cleaning our data set. There might be missing values or
outliers in the dataset. These can be handled by data cleaning. If there are many missing
values in a variable we will drop those values or substitute it with the average value.
Since the data is broken down into two modules: a Training set and Test set, we must initially
train the model. The training set includes the target variable. The decision tree regressor
algorithm is applied to the training data set. The Decision tree builds a regression model in the
form of a tree structure.
The trained model is applied to test dataset and house prices are predicted. The trained model
is then integrated with the front end using Flask in python
8
The results of the regression problems are continuous or real values. Some
commonly used regression algorithms are Linear Regression and Decision
Trees. There are several metrics involved in regression like root-mean-squared
error (RMSE) and mean-squared-error (MAE). These are some key members
of XGBoost models, each plays an important role.
In order to predict house prices, first we have to understand the factors that affect house
pricing.
• Economic growth. Demand for housing is dependent upon income. With higher
economic growth and rising incomes, people will be able to spend more on
houses; this will increase demand and push up prices. In fact, demand for housing
is often noted to be income elastic (luxury good); rising incomes leading to a
bigger % of income being spent on houses. Similarly, in a recession, falling
incomes will mean people can’t afford to buy and those who lose their job may
fall behind on their mortgage payments and end up with their home repossessed.
• Interest rates. Interest rates affect the cost of monthly mortgage payments. A
period of high- interest rates will increase cost of mortgage payments and will
cause lower demand for buying a house. High-interest rates make renting
relatively
10
more attractive compared to buying. Interest rates have a bigger effect if
homeowners have large variable mortgages. For example, in 1990-92, the sharp
rise in interest rates caused a very steep fall in UK house prices because many
homeowners couldn’t afford the rise in interest rates.
• Mortgage availability. In the boom years of 1996-2006, many banks were very
keen to lend mortgages. They allowed people to borrow large income multiples
(e.g. five times income). Also, banks required very low deposits (e.g. 100%
mortgages). This ease of getting a mortgage meant that demand for housing
increased as more people were now able to buy. However, since the credit crunch
of 2007, banks and building societies struggled to raise funds for lending on the
money markets. Therefore, they have tightened their lending criteria requiring a
bigger deposit to buy a house. This has reduced the availability of mortgages and
demand fell.
• Supply. A shortage of supply pushes up prices. Excess supply will cause prices to
fall. For example, in the Irish property boom of 1996-2006, an estimated 700,000
new houses were built. When the property market collapsed, the market was left
with a fundamental oversupply. Vacancy rates reached 15%, and with supply
greater than demand, prices fell.
11
By contrast, in the UK, housing supply fell behind demand. With a shortage, UK
house prices didn’t fall as much as in Ireland and soon recovered – despite the
ongoing credit crunch. The supply of housing depends on existing stock and new
house builds. Supply of housing tends to be quite inelastic because to get planning
permission and build houses is a time-consuming process. Periods of rising house
prices may not cause an equivalent rise in supply, especially in countries like the
UK, with limited land for home-building.
12
Geographical factors. Many housing markets are highly geographical. For example,
national house prices may be falling, but some areas (e.g. London, Oxford) may
still see rising prices. Desirable areas can buck market trends as demand is high,
and supply limited. For example, houses near goodschools or a good rail link may
have a significant premium to other areas. This graph shows that first time buyers
in London face much more expensive house prices – over 9.0 times earnings
compared to the north, where house prices are only 3.3 times earnings.
13
SAMPLE CODE
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
HouseDF = pd.read_csv('USA_Housing.csv')
HouseDF.head()
HouseDF=HouseDF.reset_index()
HouseDF.head()
HouseDF.info()
HouseDF.describe()
HouseDF.columns
sns.pairplot(HouseDF)
sns.distplot(HouseDF['Price’])
sns.heatmap(HouseDF.corr(), annot=True)
X = HouseDF[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms', 'Avg. Area
Number of Bedrooms', 'Area Population']]
y = HouseDF['Price’]
lm = minmaxscaler(feature_range=(0,1))
lm.fit_transform(X_train,y_train)
print(lm.intercept_)
coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient’])
coeff_df
14
from keras.layers import Dense,Dropout,LSTM
from keras.models import Sequential
model = Sequential()
model.add(LSTM(units = 50,activation = 'relu',return_sequences = True,input_shape =
(x_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(Dense(units = 1))
print(lm.intercept_)
coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient’])
coeff_df
predictions = lm.predict(X_test)
scale_factor = 1/0.02099517
y_predicted = y_predicted * scale_factory
y_test = y_test * scale_factor
plt.scatter(y_test,predictions)
sns.distplot((y_test-predictions),bins=50);
15
plt.figure(figsize=(12,6))
plt.plot(y_test,'b',label = 'Original Price')
plt.plot(y_predicted,'r',label = 'Predicted Price')
plt.xlabel('Time')
plt.ylabel('Price')
plt.legend()
plt.show()
16
ADVANTAGE OF LSTM OVER OTHER MODELS
The LSTM model can be tuned for various parameters such as changing the number of LSTM
layers, adding dropoutvalue or increasing the number of epochs.
The forget gate: It removes the information that is no longer required by the model. The
output gate: Output Gate at LSTM selects the information to be shown as output.
17
EXPLANATION OF THE OUTPUT RESULTS
AND THE DATASET
First we import a sample data from sklearn library , you can get different types of sample data
from Kaggle. The data taken here is the data of various parameters and the house prices in a
given city called boston in the year between 1970 to 2020.
Here for understanding purpose we have taken first 5 index/instance of data and printed them.
In total there are 506 rows of data from the dataset , of which we have printed first 5 rows
using head() function. There are 14 columns in total, i.e, 13 colums containing data of the
place, and the 14th column is the target column which contains the house prices.
18
Then we check if our data has some null values i.e missing values. Since if the data is
incomplete , then there will be error during processing state which may lead to loss of
accuracy in predicting model. Here in our given data , there is nomissing value as we can see.
Since our data contains no missing value, the program will skip the dropping phase in data
processing, where data is dropped to increase accuracy and fit missing values in a way so that
it is suitable for modelling.
Next we try to describe the data in such a way so that both people and machine find it easy to
understand the given data . In order to do thiswe use the describe() function.
19
Counts refers to the number of instances of data in each column i.e 506 since there are 506
rows of data for each columnMean refers to mean value of data in given colum.
Std means the standard value i.e the most common value in given set of data for a particular
column.
25% refers that 25 percentile of the data in that column is equal to or below that value.
Next we try to understand the correlation between the different values, in order to do that, the
best way is by using heat map. Heat map is a representation of data in the form of a map or
diagram in which data values are represented as colours.
Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate)
20
HEATMAP – for better understanding of which place is best suited for individual personal
preference based on given dataset. This uses correlation concept
Next we split our data into variables x and y , in order to train our model to predict data.
21
Here the varible x contains the value of the first 13 columns i.e the parameters that are
required for calculating and predicting the house prices. The varible y contains the 14th
column values which are the house prices.
First we predict the values in y using the values in x . Then we compare the actual prices and
predicted prices by using scatter plot. Then we find the r square error and mean square error
between them . If the errors is less enough then we proceed for testing of the model since the
training phase is over. If the error is large , thenwe use optimizers like adam, and repeat drop
and fitting process for a set number of epochs to reduce the error.
The r square error or mean square error for good accuracy of the model in predicting the data
is indicatednumerically also.
Then during testing process we predict the future house prices using present and past data
parameters of houses in an location. Then we plot this graphically as a house price over time
graph.
For training the model , the error needs to be minimum for greater accuracy of model. The
error between the actual and predicted price is plotted graphically using scatter plot. Here we
can see that error is minimum since the data points of actual and predicted value are close to
each other
22
PREDICTED VALUE OF HOUSE PRICE BASED ON TEST SAMPLE DATA
23
ALGORITHM BRIEF OUTLINE
1. Import the python libraries that are required for house price prediction using linear
regression. Example: numpy is used for convention of data to 2d or 3d array format
which is required for linear regression model ,matplotlib for plotting the graph , pandas
for readingthe data from source and manipulation that data, etc.
2. First Get the value from source and give it to a data frame and thenmanipulate this data
to required form using head(),indexing, drop().
3. Next we have to train a model, its always best to spilt the data intotraining data and test
data for modelling.
4. Its always good to use shape() to avoid null spaces which will cause error during
modelling process.
5. Its good to normalize the value since the values are in very large quantity for house
prices , for this we may use minmaxscaler to reduce the gap between prices so that its
easy and less time consuming for comparing and values.range usually specified is
between 0 to 1 using fittransform.
6. Then we have to make few imports from keras: like sequential for initializing the
network,lstm to add lstm layer, dropout to prevent overfitting of lstm layers, dense to
add a densely connected networklayer for output unit.
8. To compile this model its always best to use adam optimizer and set the loss as
required for the specific data.
9. We can fit the model to run for a number of epochs. Epochs are the number of times
the learning algorithm will work through the entire training set.
24
10. Then we convert the values back to normal form by using inverse minimal scale by
scale factor.
11. Then we give a test data(present data)to the trained model to get the predicted
value(future data).
12. Then we can use matplotlib to plot a graph comparing the test and predicted value to
see the increase/decrease rate of values in each time of the year in a particular place.
Based on this people will know when its best time to sell or buy a place in a given
location.
25
ACKNOWLEDGEMENT
I would like to express my sincere and deep sense of gratitude to my Project Guide
Dr.S.Lalithakumari for his valuable guidance, suggestions and constant encouragement paved
way for the successful completion of my project work. I wish to express my thanks to all
Teaching and Non-teaching staff members of the Department of ECE who were helpful in
many ways for the completion of the project.
26
CONCLUSION
Thus the machine learning model to predict the house price based on given dataset is executed
successfully using xg regressor (a upgraded/slighted boosted form of regular linear regression,
this gives lesser error). This model further helps people understand whether this place is more
suited for them based on heatmap correlation. It also helps people looking to sell a house at
best time for greater profit. Any house price in any location can be predicted with minimum
errorby giving appropriate dataset.
27
SOFTWARE TOOLS
• Keras
• Jupyter
• Visual studio
• R Square
• Adjusted R Square
• MSE
• RMSE
• MAE
• Google colla
28
REFERENCES
• Real Estate Price Prediction with Regression and Classification, CS 229 Autumn 2016
Project Final Report
• Gongzhu Hu, Jinping Wang, and Wenying Feng Multivariate Regression Modellingfor
Home Value Estimates with Evaluation using Maximum Information Coefficient
• Byeonghwa Park , Jae Kwon Bae (2015). Using machine learning algorithms for
housing price prediction , Volume 42, Pages 2928-2934 [4] Douglas C. Montgomery,
Elizabeth A. Peck, G. Geoffrey Vining, 2015. Introduction to Linear Regression
Analysis.
• Iain Pardoe, 2008, Modelling Home Prices Using Realtor Data
• Aaron Ng, 2015, Machine Learning for a London Housing Price Prediction Mobile
Application
• Wang, X., Wen, J., Zhang, Y.Wang, Y. (2014). Real estate price forecasting based on
SVM optimized by PSO. Optik-International Journal for Light and Electron Optics,
125(3), 14391443
29