Report Car Price Prediction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Old Car Price Prediction

Tools and Technique Laboratory

Submitted To : Prof .Mainak Bandyopadhyay

Submitted By :

Vinayak Tripathi Jaideep Sharma

2005350 20051730

Department of Computer Science and


Engineering

School of Computer Engineering

KIIT University
Abstract
The price of a new car in the industry is fixed by the manufacturer with some
additional costs incurred by the Government in the form of taxes. So, customers
buying a new car can be assured of the money they invest to be worthy. But, due to
the increased prices of new cars and the financial incapability of the customers to buy
them, Used Car sales are on a global increase. Therefore, there is an urgent need for a
Used Car Price Prediction system which effectively determines the worthiness of the
car using a variety of features. Existing System includes a process where a seller
decides a price randomly and buyer has no idea about the car and it’s value in the
present day scenario. In fact, seller also has no idea about the car’s existing value or
the price he should be selling the car at. To overcome this problem we have developed
a model which will be highly effective. Regression Algorithms are used because they
provide us with continuous value as an output and not a categorized value. Because of
which it will be possible to predict the actual price a car rather than the price range of
a car. User Interface has also been developed which acquires input from any user and
displays the Price of a car according to user’s inputs.The results of this project can be
useful for car dealerships and private sellers to price their cars accurately and
optimize their profits. Future work can include expanding the dateset to include more
features such as the car's service history, accident history, and location, and exploring
other machine learning algorithms to improve the accuracy of our model.

Introduction

Determining whether the listed price of a used car is a challenging task, due to the
many factors that drive a used vehicle’s price on the market. The focus of this project
is developing machine learning models that can accurately predict the price of a used
car based on its features, in order to make informed purchases. We implement and
evaluate various learning methods on a dateset consisting of the sale prices of
different makes and models . We will compare the performance of various machine
learning algorithms like Linear Regression, Ridge Regression, Lasso Regression,
Elastic Net, Decision Tree Regressor and choose the best out of it. Depending on
various parameters we will determine the price of the car. Regression Algorithms are
used because they provide us with continuous value as an output and not a categorized
value because of which it will be possible to predict the actual price a car rather than
the price range of a car. User Interface has also been developed which acquires
input from any user and displays the Price of a car according to user’s inputs.

In this project, we will use a linear regression model to train our machine learning
algorithm to predict the selling price of old cars. Linear regression is a simple and
widely used machine learning algorithm that models the linear relationship between
the dependent variable (selling price) and one or more independent variables (car
model, year of manufacture, condition, mileage, etc.). By training our model on
historical data on old car sales, we can predict the selling price of new cars accurately.
The results of our model can be used by car dealerships and private sellers to price
their cars accurately and maximize their profits.
Problem statement
For the purposes of car valuation, popular guides tend not to use machine learning.
Instead, they source data from local sales and average the prices of many similar cars.
This method works well if you have a common car with a common set of features.
The condition of the car is judged very roughly, typically on a scale of one to three.
Cars that are “unusual” are therefore hard to evaluate. Effectively, no inferences are
drawn from similar cars but from a different make and model, whereas with machine
learning, the entirety of the dataset and its features are used to train the model
predictions. Using machine learning is a solution to the problem of utilization of all
the data and will assist in utilizing all the features of a car to make valuations.

New cars of a particular make, model, location, and feature selection are identical in
condition, function, and price. When new cars are sold for the first time they are then
classified as used cars. As an asset ages, its price changes because it declines in
efficiency in the current and in all future periods. Depreciation reflects the change in
net present value over time. Revaluation, on the other hand, is the change in value or
price of an asset that is caused by everything other than aging. This includes price
changes due to inflation, obsolescence, and any other change not associated with
aging [2]. Used cars are subject to depreciation and revaluation. Depreciation can be
used as an umbrella term for both of these, and the rest of this report will follow that
convention when referring to the loss of value over time. Revaluation plays a part in
the depreciation of cars based on the features that they have. Power hungry cars will
be less sought after when the price of gasoline is high, for example. A car with the
same make, model, year, and geographic region, but this a larger engine than a
different car should command a different value at different times.

In addition to the age of the car and the revaluation of its features, used cars have a
unique service history that develops over time. Parts will become worn with time and
miles driven (mileage). What is replaced, when it is replaced, and by whom, are all to
be considered as it relates to the current working condition of the car and its
desirability on the market. The particularities are difficult to account for in traditional
price-setting models, as it is a major differentiator in vehicles. Generally, it is
summarized in the “condition” of the car. The value of repairs or custom
modifications to the car are recognized only if they noticeably improve the overall
condition of the car.

Using machine learning to better utilize data on all the less common features of a car
can more accurately predict the value of a vehicle. This is a clear benefit to consumers,
especially those who themselves cannot ascertain the value of the vehicle that they are
buying or selling and must rely on a tool. A tool that is more tailored to the non-
standard features of the car can provide a more accurate price and make the market
fairer for all participants.

There are several machine learning regression models that can be applied to price
prediction. This work will investigate which one offers the best performance
according to several criteria. The nature of machine learning is to train on past data to
predict unseen data. Applied to price prediction of cars, the data is sourced from past
sales while the predictions are for the present value of cars. Therefore, a criterion for
the selection of a machine learning model it remains accurate in its predictions for
future years, not included in the data set.

Requirements
Hardware requirements
Operating system- Windows 7,8,10
Processor- dual core 2.4 GHz (i5 or i7 series Intel processor
or equivalent AMD)
RAM-4GB
Software Requirements
Python
Pycharm(Any other IDE)
PIP 2.7
Jupyter Notebook
Chrome

Theory:
This chapter will explain relevant theory and related work. This includes
concepts related to regression learning, all metrics used for the
performance measurement of the models, and related research in the
field of machine learning applied to price prediction.

Regression Machine Learning

Regression analysis is a fundamental concept in the field of machine


learning. It is a type of supervised machine learning wherein the model
is trained with both input features and output labels. It helps in
establishing a relationship among the variables by estimating how they
in combination arrive at an estimated output in the form of a continuous
variable rather than a discrete label. The input variables are called
independent variables and correspond to features in the dataset, while
the output variable is called the dependent variable. The simplest of these
algorithms is linear regression which assumes that the relationship of
each variable is linearly proportional to the output.

Overfitting
Overfitting a model is a condition where a statistical model begins to
describe the random error in the data rather than the relationships
between variables. This condition can affect all supervised machine
learning models. In the case of regression models, overfitting can occur
when there many terms for the number of observations. This leads to
the regression coefficients representing the noise rather than the actual
relationships in the data. Much better prediction results on the training
data is an indication of overfitting.
Linear Regression

Linear Regression is a technique to estimate the linear relationship


between each of a number of independent variables and a dependent
variable. Linear Regression fits a linear model with coefficients w =
(w1, …, wp) to minimize the residual sum of squares between the
observed targets in the dataset, and the targets predicted by the linear
approximation.

Ridge Regression

Ridge Regression is closely related to linear regression and also assumes


a linear relationship between features and the dependent variable (price).
It utilizes a regularization technique that penalizes the use of large
coefficients when optimizing the linear relationship. A supplied
parameter alpha determines the factor with which large coefficients are
penalized. Ridge regression performs L2 regularization meaning that it
adds a penalty equal to the square of the magnitude of coefficients.
Minimization Objective: (LR-Obj) + α*(sum of square of coefficients) (1)
Lasso Regression

Lasso (Least Absolute Shrinkage and Selection Operator) regression


performs L1 regularization meaning that it adds a factor of the sum of
the absolute value of coefficients in the optimization objective. This
penalizes large coefficients when optimizing the linear relationship of
each variable, like Ridge Regression.
Minimization Objective: (LR-Obj) + α*(sum of absolute value of coefficients) (2)

METHODOLGY

There are two primary phases in the system: 1. Training phase: The system is trained
by using the data in the data set and fits a model (line/curve) based on the algorithm
chosen accordingly. 2. Testing phase: the system is provided with the inputs and is
tested for its working. The accuracy is checked. And therefore, the data that is used to
train the model or test it, has to be appropriate. The system is
designed to detect and predict price of used car and hence appropriate algorithms
must be used to do the two different tasks.

Data Collection:
We collected data on old car sales from various sources such as online classifieds,
used car dealerships, and private sales. We used web scraping techniques to extract
data from online sources and manually collected data from other sources.

Data Preprocessing:

The collected data was cleaned and preprocessed to remove any missing or irrelevant
data. We also performed feature engineering to extract important features such as the
age of the car, mileage, and condition. We used pandas and numpy libraries to
perform data cleaning and feature engineering.

Model Training:

We used a linear regression model to train our machine learning algorithm to predict
the selling price of old cars. We split the data into training and testing sets to avoid
overfitting and used cross-validation techniques to improve the accuracy of our model.
We used scikit-learn library to train and evaluate our model.

Objective

To develop a efficient and effective model which predicts the


price of a used car according to user’s inputs.
To achieve good accuracy.
To develop a User Interface( UI ) which is user-friendly and
takes input from the user and predicts the price.
FUTURE SCOPE
In future this machine learning model may bind with various website which can
provide real time data for price prediction. Also we may add large historical data of
car price which can help to improve accuracy of the machine learning model. We can
build an android app as user interface for interacting with user. For better performance,
we plan to judiciously design deep learning network structures, use adaptive learning
rates and train on clusters of data rather than the whole dataset.

CONCLUSION
The increased prices of new cars and the financial incapability of the customers to buy
them, Used Car sales are on a global increase. Therefore, there is an urgent need for a
Used Car Price Prediction system which effectively determines the worthiness of the
car using a variety of features. The proposed system will help to determine the
accurate price of used car price prediction.

REFRENCES

[1] Sakshi Gupta, “Regression vs. Classification in Machine Learning”,


October 6, 2021 [Online] Available:
https://www.springboard.com/blog/data-science/regression-vs
classification/ [Accessed April 25, 2023].

[2] Skikit-learn, supervised-learning. [Online] Available: https://scikit


learn.org/stable/supervised_learning.html#supervised-learning
[Accessed April 24, 2023]

[3] Aarshay Jain, “A Complete Tutorial on Ridge and Lasso Regression in Python”,
January 28, 2016. [Online], Available:
https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso
regression-python-complete-tutorial/ [Accessed March 24, 2023]

[4] Flask Documentation [Online] Available:


https://flask.palletsprojects.com/en/2.3.x/ [Accessed March 24, 2023]

You might also like