Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020

Project Presentation
On
House Price Prediction System
Course: Computer Science

MSC Semester 4
Presented By
Name: Simran B Solanki Roll No: 19020
Project analysis slideIntroduction

4
.
• Trends in housing prices indicate the current economic situation and also are a concern to the buyers and
sellers.
• Now. a days everyone wishes for a house that suits their lifestyle and provides amenities according to their
needs. House prices keep on changing very frequently which proves that house prices are often exaggerated.
• There are many factors that have an impact on house prices, such as the number of bedrooms and bathrooms.
House price depends upon its location as well.
• Regardless of whether someone wants to sells or buy the house identifying the correct price is still a
challenge.
• We need an autonomous system which will help people to find the correct house price based on their
requirements. Such autonomous system can be build using various Machine learning algorithms and
.
performing data analysis.
• The proposed system will take different features such as location, carpet area, etc as input and various
regression algorithms. The proposed system predicts house prices using a regression machine learning .
algorithm.
Data Collection
Project analysis slide 4
Data Cleaning
.
Exploratory Data
Create an Analysis
effective House
.
Price Prediction
Model Feature Selection
Project Implementation Data Transformation

Schedules
Objectives Plan
Train Data Set
Best fit Test Data set
regression
. .
model with less
error rate Regression Model
Resources .
Evaluating Regression
Model
Experimental Setup
Programming Language
Python
Tools
Jupyter Spyder
Notebook
Libraries
Pandas Numpy Matplot Seaborn Altair Sklearn Streamlit

Overview

Data Collection Data Transformation
Data Cleaning
House
Price
Regression Model
Prediction
System
Exploratory Data
Analysis
Feature Selection Classification Model

Data Collection
Data Collected through Google Form in form of

Responses received through google form
Questionnaire
https://docs.google.com/forms/d/107XIIJ1n1kgKjKmjZnR-PYEW3hQ-
Google Form Link
jblNUfylul82qDQ/edit?ts=606d4684#responses
Data Cleaning
ProjectDataanalysis slide 3
Collected through survey Changing the Dataset Column
Data Collected the Survey FINANCIAL ECOOMIC ECOLOGICAL

ANALYSIS ANALYSIS ANALYSIS
. Lorem ipsum dolor sit Lorem ipsum dolor sit Lorem ipsum dolor sit
amet, consectetur amet, consectetur amet, consectetur
adipiscing elit, sed do adipiscing elit, sed do adipiscing elit, sed do
eiusmod tempor eiusmod tempor eiusmod tempor
incididunt ut labore et incididunt ut labore et incididunt ut labore et
dolore magna aliqua. dolore magna aliqua. dolore magna aliqua.
Data Cleaning
Count of Null Value and Converting Categorical data into Integer
Removing the null value
Exploratory Data Analysis
Analysis Result
1) Bar graph to represent Comparison of House Price w.r.t Features

From the graph we can conclude that features such as No of bedrooms, 24Hr Water Supply, Gas Pipeline, Lift and
medical are highly influenced for increase in price whereas other features does not affect price as much.
2) Count plot to represent Analysis on Budget
A) Budget Vs House Loan
From the graph we can observe that people have budget below 1 cr and above 6cr have the highest changes to take a house
loan
B) Budget Vs Carpet Area
From the graph we conclude that:
Budget greater than 1Cr target 1,2,3 and 4 bhk with higher carpet area, except budget range between 5Cr-7Cr target 3 and
4bhk with higher carpet area, budget range between 50lakh and 90Lakh target 1,2 and 3bhk with higher carpet area budget
range less that 40lakh target 0,1 and 2bhk with medium carpet area
3) Count plot to represent Analysis on Income
B) Income vs House Loan
From the we can observe that people having income between 1 lakh - 3 lakh, 9 lakh - 11 lakh and 11 lakh and above have
above 70% changes to take a house loan
4) Box Plot to represent Outlier w.r.t No of Bedrooms vs Carpet Area
From graph we can observe that there are some outliers in carpet area with respect to rooms such as in 0 No. of bedrooms i.e 1
RK range of carpet area lies between 100 to 800 therefore house having carpet area 3500 is a outlier, in 1 BK range of carpet
area lies between 200 and 760 therefore we have 3 higher outliers with carpet area 1000,1010 and 2500, in 2 BK range of
carpet area lies between 480 and 1500 therefore we have 1 higher outliers with carpet area 1750, in 3 BK range of carpet area
lies between 900 and 2050 therefore we have 1 higher outliers with carpet area 2400, in 4 BK range of carpet area lies between
1990 and 2350 therefore we have 1 higher outliers with carpet area 3500 and one lower outlier 100.
5) Histogram to represent Variation in Carpet Area
From the graph we can observe that the dataset contains carpet area majority between 300 to 1000 sq.ft.
6) Count plot to represent Count of House Loan w.r.t Carpet Area
From the graph we can observe that majority of the people were willing to take the house loan.
7) Graphs to represent Regression Analysis
A) Carpet Area vs No of Bedroom
From the above graph we can conclude that there is a linear relation between the No. of Bedrooms and Carpet Area as No. of
Bedrooms increases Carpet Area also increases but the points do not fit on the regression line.
B) Carpet Area vs Price
From the above graph we can conclude that there is a linear relation between the House Price and Carpet Area as Carpet Area
increases House Prices also increases but the points do not fit on the regression line.
Analysis Result
Analysis Result
Bar Chart to represent location wise House Prices

Analysis Result
Feature Selection
Highly Co-related Feature Heat Map to represent Highly Co-related Feature
Club House 24 hr Security

Data Transformation
Feature Scaling Data Encoding
Features with varying degrees of magnitude That most machine learning algorithms

and range will cause different step sizes for require numerical input and output
each feature. Therefore, to ensure that variables. That an integer and one
gradient descent converges more smoothly hot encoding is used to convert
and quickly, we need categorical data to integer data.
to scale our features so that they share a
similar scale.
Data Transformation
Splitting Dataset Label Encoding
Trainingset is
Training set isthe
theone
oneon onwhich
whichwe train and
we train andfitfitour
our
model basically to fit thetoparameters LabelEncoder encode labels with a
model basically fit the parameters
value between 0 and n_classes-1
Test data is used only to assess performance of
where n is the number of distinct labels
model
Regression Models
Multiple Linear Regression Support Vector Regression
Decision Tree Regression Random Forest Regression

Multiple Linear Regression
Multiple regression is an extension of simple linear regression. It is used when we want to predict the
value of a variable based on the value of two or more other variables. The variable we want to predict is
called the dependent variable (or sometimes, the outcome, target or criterion variable)
Actual vs Predicted values

R^2 Score: 0.6604280706669454
The model score is 66% that means

66% of the data fit the regression
model, since the r-squared score is
not close to 1, so the model does not
fit best.
Support Vector Regression
Supervised Machine Learning Models with associated learning algorithms that analiyze data for
classification and regression analysis are known as Support Vector Regression. SVR is built based on
the concept of Support Vector Machine or SVM.

R^2 Score: 0.518815148933865

only 51% of the data fit the
regression mode, since the r-squared
score is not close to 1, so the model
does not fit best.
Decision Tree Regression
Decision Tree is one of the most commonly used, practical approaches for supervised learning. It can be
used to solve both Regression and Classification tasks with the latter being put more into practical
application. The Root Node is the initial node which represents the entire sample and may get split further
into further nodes. The Interior Nodes represent the features of a data set and the branches represent the
decision rules. Finally, the Leaf Nodes represent the outcome.

R^2 Score: 0.45316105389854877

only 45% of the data fit the
regression model, the model score is
less than 50% ,since the r-squared
score is not close to 1, so the model
does not fit best.
Random Forest Regression
Random Forest Regression is a supervised learning algorithm that uses ensemble learning method for
regression. A Random Forest operates by constructing several decision trees during training time and
outputting the mean of the classes as the prediction of all the trees.

R^2 Score: 0.755300315635314

75% of the data fit the regression
model, since the r-squared score is
close to 1 the model fits best.
Accuracy of the Regression Models
Regression Model R2 Score
Multiple Linear Regression 0.6604280706669454
Support Vector Regression 0.518815148933865
Decision Tree Regressor 0.45316105389854877
Random Forest Regressor 0.755300315635314
By comparing the R2 of the regressions model we conclude that the Random Forest Regressor have
more accuracy in prediction when compared to the others regression model, it has the highest R2
Score i.e 0.755300315635314
Classification Models
Random Forest Classifier KNN Classifier

Random Forest Classifier
Random forest, like its name implies, consists of a large number of individual decision trees that
operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the
class with the most votes become our model’s prediction
Classification Report Confusion Matrix Heat Map
From the above analysis result we conclude

that since the overall accuracy score of the
model is 0.51 which is not that close to 1 so
the model does not fit best.
K-Nearest Neighbors Classifier
KNN algorithms use data and classify new data points based on similarity measures (e.g distance
function). Classification is done by a majority vote to its neighbours.
Choosing the K value Checking the accuracy of K = 23
From the above graph we conclude that since the From the graph we conclude that since the accuracy
error rate does not fluctuate after k=23, so we choose of K value at k=23 neither increases nor decreases
k value as K=23 the k value chosen is accurate
K-Nearest Neighbors Classifier
Classification Report Confusion Matrix Heat Map
From the above analysis result we conclude that since the overall accuracy score of the model is 0.71 which is
close to 1 so the model fits best.
Accuracy of the Classification Models
Classification Model Accuracy
Random Forest Classifier 0.5121951219512195
K-Nearest Neighbor Classifier 0.7073170731707317
By comparing the accuracy of the classification model we conclude that the K- Nearest Neighbor
Classifier have more accuracy in prediction when compared to the others classification model, it has the
highest Accuracy Score i.e 0.7073170731707317
Hence, K-Nearest Neighbor Classifier is used for predicting whether the user will take a house loan or
not take a house loan to buy a house.
Project analysis slideConclusion

4
.
• The main of this project is to determine the prediction of house prices which have successfully done using
different machine learning algorithms like Multiple Linear Regression, Support Vector Regressor, Decision
Tree . Regressor and Random Forest Regressor, so after the analysis it was clear that Random Forest Regressor
have more accuracy in prediction as compared to other regression models.
• Maximum house owners are between the age group of 46-65 and male house owners are more as compared to
female.
• Maximum new house buyers are between the age group of 36-55 and minimum new house buyers are
between the age group of 25-35 and 56-45 male buyers are more as compared to female.
• The house price is highly dependent on the following features (No of Bed Rooms, Carpet Area, Location,
24Hr Water Supply, Gas Pipeline, Lift and medical).
• Most. of the buyers take housing loan to buy a new house.
• Since we considered western line house prices so people staying between Borivali to Bandra have moderate
house price as compared to people staying in between Mahim to Churchgate have very high prices and. people
staying in between Dahisar to Virar have less house price.
Thank You

Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020

Uploaded by

Copyright:

Available Formats

Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020

Uploaded by

Copyright:

Available Formats

Project Presentation

Course: Computer Science

Project analysis slideIntroduction

Project Implementation Data Transformation

Pandas Numpy Matplot Seaborn Altair Sklearn Streamlit

Project analysis slide 2

Feature Selection Classification Model

Data Collected through Google Form in form of

Data Collected the Survey FINANCIAL ECOOMIC ECOLOGICAL

1) Bar graph to represent Comparison of House Price w.r.t Features

Bar Chart to represent location wise House Prices

Highly Co-related Feature Heat Map to represent Highly Co-related Feature

Club House 24 hr Security

Feature Scaling Data Encoding

Features with varying degrees of magnitude That most machine learning algorithms

Multiple Linear Regression Support Vector Regression

Decision Tree Regression Random Forest Regression

Actual vs Predicted values

The model score is 66% that means

Actual vs Predicted values

The model score is 51% that means

Actual vs Predicted values

The model score is 45% that means

Actual vs Predicted values

The model score is 75% that means

Regression Model R2 Score

Multiple Linear Regression 0.6604280706669454

Support Vector Regression 0.518815148933865

Decision Tree Regressor 0.45316105389854877

Random Forest Regressor 0.755300315635314

Random Forest Classifier KNN Classifier

Classification Report Confusion Matrix Heat Map

From the above analysis result we conclude

Choosing the K value Checking the accuracy of K = 23

Classification Model Accuracy

Random Forest Classifier 0.5121951219512195

K-Nearest Neighbor Classifier 0.7073170731707317

Project analysis slideConclusion

You might also like