Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 32

Project Presentation

On
House Price Prediction System

Course: Computer Science


MSC Semester 4

Presented By
Name: Simran B Solanki Roll No: 19020
House Price Prediction System

Project analysis slideIntroduction


4
.
• Trends in housing prices indicate the current economic situation and also are a concern to the buyers and
sellers.
• Now. a days everyone wishes for a house that suits their lifestyle and provides amenities according to their
needs. House prices keep on changing very frequently which proves that house prices are often exaggerated.
• There are many factors that have an impact on house prices, such as the number of bedrooms and bathrooms.
House price depends upon its location as well.
• Regardless of whether someone wants to sells or buy the house identifying the correct price is still a
challenge.
• We need an autonomous system which will help people to find the correct house price based on their
requirements. Such autonomous system can be build using various Machine learning algorithms and
.
performing data analysis.
• The proposed system will take different features such as location, carpet area, etc as input and various
regression algorithms. The proposed system predicts house prices using a regression machine learning .
algorithm.
House Price Prediction System
Data Collection
Project analysis slide 4
Data Cleaning
.
Exploratory Data
Create an Analysis
effective House
.
Price Prediction
Model Feature Selection

Project Implementation Data Transformation


Schedules
Objectives Plan
Train Data Set
Best fit Test Data set
regression
. .
model with less
error rate Regression Model
Resources .
Evaluating Regression
Model
Experimental Setup

Programming Language

Python

Tools

Jupyter Spyder
Notebook

Libraries

Pandas Numpy Matplot Seaborn Altair Sklearn Streamlit


Overview

Project analysis slide 2


Data Collection Data Transformation

Data Cleaning
House
Price
Regression Model
Prediction
System
Exploratory Data
Analysis

Feature Selection Classification Model


Data Collection

Data Collected through Google Form in form of


Responses received through google form
Questionnaire

https://docs.google.com/forms/d/107XIIJ1n1kgKjKmjZnR-PYEW3hQ-
Google Form Link
jblNUfylul82qDQ/edit?ts=606d4684#responses
Data Cleaning

ProjectDataanalysis slide 3
Collected through survey Changing the Dataset Column

Data Collected the Survey FINANCIAL ECOOMIC ECOLOGICAL


ANALYSIS ANALYSIS ANALYSIS

. Lorem ipsum dolor sit Lorem ipsum dolor sit Lorem ipsum dolor sit
amet, consectetur amet, consectetur amet, consectetur
adipiscing elit, sed do adipiscing elit, sed do adipiscing elit, sed do
eiusmod tempor eiusmod tempor eiusmod tempor
incididunt ut labore et incididunt ut labore et incididunt ut labore et
dolore magna aliqua. dolore magna aliqua. dolore magna aliqua.
Data Cleaning
Project analysis slide 5
Count of Null Value and Converting Categorical data into Integer
Removing the null value
Exploratory Data Analysis
Analysis Result

1) Bar graph to represent Comparison of House Price w.r.t Features


From the graph we can conclude that features such as No of bedrooms, 24Hr Water Supply, Gas Pipeline, Lift and
medical are highly influenced for increase in price whereas other features does not affect price as much.
2) Count plot to represent Analysis on Budget
A) Budget Vs House Loan
From the graph we can observe that people have budget below 1 cr and above 6cr have the highest changes to take a house
loan
B) Budget Vs Carpet Area
From the graph we conclude that:
Budget greater than 1Cr target 1,2,3 and 4 bhk with higher carpet area, except budget range between 5Cr-7Cr target 3 and
4bhk with higher carpet area, budget range between 50lakh and 90Lakh target 1,2 and 3bhk with higher carpet area budget
range less that 40lakh target 0,1 and 2bhk with medium carpet area
3) Count plot to represent Analysis on Income
B) Income vs House Loan
From the we can observe that people having income between 1 lakh - 3 lakh, 9 lakh - 11 lakh and 11 lakh and above have
above 70% changes to take a house loan
Exploratory Data Analysis
4) Box Plot to represent Outlier w.r.t No of Bedrooms vs Carpet Area

From graph we can observe that there are some outliers in carpet area with respect to rooms such as in 0 No. of bedrooms i.e 1
RK range of carpet area lies between 100 to 800 therefore house having carpet area 3500 is a outlier, in 1 BK range of carpet
area lies between 200 and 760 therefore we have 3 higher outliers with carpet area 1000,1010 and 2500, in 2 BK range of
carpet area lies between 480 and 1500 therefore we have 1 higher outliers with carpet area 1750, in 3 BK range of carpet area
lies between 900 and 2050 therefore we have 1 higher outliers with carpet area 2400, in 4 BK range of carpet area lies between
1990 and 2350 therefore we have 1 higher outliers with carpet area 3500 and one lower outlier 100.
Exploratory Data Analysis
5) Histogram to represent Variation in Carpet Area

From the graph we can observe that the dataset contains carpet area majority between 300 to 1000 sq.ft.
6) Count plot to represent Count of House Loan w.r.t Carpet Area
From the graph we can observe that majority of the people were willing to take the house loan.
Exploratory Data Analysis
7) Graphs to represent Regression Analysis
A) Carpet Area vs No of Bedroom

From the above graph we can conclude that there is a linear relation between the No. of Bedrooms and Carpet Area as No. of
Bedrooms increases Carpet Area also increases but the points do not fit on the regression line.
Exploratory Data Analysis
B) Carpet Area vs Price

From the above graph we can conclude that there is a linear relation between the House Price and Carpet Area as Carpet Area
increases House Prices also increases but the points do not fit on the regression line.
Exploratory Data Analysis
Analysis Result
Exploratory Data Analysis
Analysis Result

Bar Chart to represent location wise House Prices


Exploratory Data Analysis
Analysis Result
Feature Selection

Highly Co-related Feature Heat Map to represent Highly Co-related Feature

Club House 24 hr Security


Data Transformation

Feature Scaling Data Encoding

Features with varying degrees of magnitude That most machine learning algorithms


and range will cause different step sizes for require numerical input and output
each feature. Therefore, to ensure that variables. That an integer and one
gradient descent converges more smoothly hot encoding is used to convert
and quickly, we need categorical data to integer data.
to scale our features so that they share a
similar scale.
Data Transformation
Splitting Dataset Label Encoding

Trainingset is
Training set isthe
theone
oneon onwhich
whichwe train and
we train andfitfitour
our
model basically to fit thetoparameters LabelEncoder encode labels with a
model basically fit the parameters
value between 0 and n_classes-1
Test data is used only to assess performance of
where n is the number of distinct labels
model
Regression Models

Multiple Linear Regression Support Vector Regression

Decision Tree Regression Random Forest Regression


Multiple Linear Regression
Multiple regression is an extension of simple linear regression. It is used when we want to predict the
value of a variable based on the value of two or more other variables. The variable we want to predict is
called the dependent variable (or sometimes, the outcome, target or criterion variable)

Actual vs Predicted values


R^2 Score: 0.6604280706669454

The model score is 66% that means


66% of the data fit the regression
model, since the r-squared score is
not close to 1, so the model does not
fit best.
Support Vector Regression
Supervised Machine Learning Models with associated learning algorithms that analiyze data for
classification and regression analysis are known as Support Vector Regression. SVR is built based on
the concept of Support Vector Machine or SVM.

Actual vs Predicted values


R^2 Score: 0.518815148933865

The model score is 51% that means


only 51% of the data fit the
regression mode, since the r-squared
score is not close to 1, so the model
does not fit best.
Decision Tree Regression
Decision Tree is one of the most commonly used, practical approaches for supervised learning. It can be
used to solve both Regression and Classification tasks with the latter being put more into practical
application. The Root Node is the initial node which represents the entire sample and may get split further
into further nodes. The Interior Nodes represent the features of a data set and the branches represent the
decision rules. Finally, the Leaf Nodes represent the outcome.

Actual vs Predicted values


R^2 Score: 0.45316105389854877

The model score is 45% that means


only 45% of the data fit the
regression model, the model score is
less than 50% ,since the r-squared
score is not close to 1, so the model
does not fit best.
Random Forest Regression
Random Forest Regression is a supervised learning algorithm that uses ensemble learning method for
regression. A Random Forest operates by constructing several decision trees during training time and
outputting the mean of the classes as the prediction of all the trees.

Actual vs Predicted values


R^2 Score: 0.755300315635314

The model score is 75% that means


75% of the data fit the regression
model, since the r-squared score is
close to 1 the model fits best.
Accuracy of the Regression Models

Regression Model R2 Score

Multiple Linear Regression 0.6604280706669454

Support Vector Regression 0.518815148933865

Decision Tree Regressor 0.45316105389854877

Random Forest Regressor 0.755300315635314

By comparing the R2 of the regressions model we conclude that the Random Forest Regressor have
more accuracy in prediction when compared to the others regression model, it has the highest R2
Score i.e 0.755300315635314
Classification Models

Random Forest Classifier KNN Classifier


Random Forest Classifier
Random forest, like its name implies, consists of a large number of individual decision trees that
operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the
class with the most votes become our model’s prediction

Classification Report Confusion Matrix Heat Map

From the above analysis result we conclude


that since the overall accuracy score of the
model is 0.51 which is not that close to 1 so
the model does not fit best.
K-Nearest Neighbors Classifier
KNN algorithms use data and classify new data points based on similarity measures (e.g distance
function). Classification is done by a majority vote to its neighbours.

Choosing the K value Checking the accuracy of K = 23

From the above graph we conclude that since the From the graph we conclude that since the accuracy
error rate does not fluctuate after k=23, so we choose of K value at k=23 neither increases nor decreases
k value as K=23 the k value chosen is accurate
K-Nearest Neighbors Classifier
Classification Report Confusion Matrix Heat Map

From the above analysis result we conclude that since the overall accuracy score of the model is 0.71 which is
close to 1 so the model fits best.
Accuracy of the Classification Models

Classification Model Accuracy

Random Forest Classifier 0.5121951219512195

K-Nearest Neighbor Classifier 0.7073170731707317

By comparing the accuracy of the classification model we conclude that the K- Nearest Neighbor
Classifier have more accuracy in prediction when compared to the others classification model, it has the
highest Accuracy Score i.e 0.7073170731707317

Hence, K-Nearest Neighbor Classifier is used for predicting whether the user will take a house loan or
not take a house loan to buy a house.
House Price Prediction System

Project analysis slideConclusion


4
.
• The main of this project is to determine the prediction of house prices which have successfully done using
different machine learning algorithms like Multiple Linear Regression, Support Vector Regressor, Decision
Tree . Regressor and Random Forest Regressor, so after the analysis it was clear that Random Forest Regressor
have more accuracy in prediction as compared to other regression models.
• Maximum house owners are between the age group of 46-65 and male house owners are more as compared to
female.
• Maximum new house buyers are between the age group of 36-55 and minimum new house buyers are
between the age group of 25-35 and 56-45 male buyers are more as compared to female.
• The house price is highly dependent on the following features (No of Bed Rooms, Carpet Area, Location,
24Hr Water Supply, Gas Pipeline, Lift and medical).
• Most. of the buyers take housing loan to buy a new house.
• Since we considered western line house prices so people staying between Borivali to Bandra have moderate
house price as compared to people staying in between Mahim to Churchgate have very high prices and. people
staying in between Dahisar to Virar have less house price.
Thank You

You might also like