Report Car Price Prediction
Report Car Price Prediction
Report Car Price Prediction
Submitted By :
2005350 20051730
KIIT University
Abstract
The price of a new car in the industry is fixed by the manufacturer with some
additional costs incurred by the Government in the form of taxes. So, customers
buying a new car can be assured of the money they invest to be worthy. But, due to
the increased prices of new cars and the financial incapability of the customers to buy
them, Used Car sales are on a global increase. Therefore, there is an urgent need for a
Used Car Price Prediction system which effectively determines the worthiness of the
car using a variety of features. Existing System includes a process where a seller
decides a price randomly and buyer has no idea about the car and it’s value in the
present day scenario. In fact, seller also has no idea about the car’s existing value or
the price he should be selling the car at. To overcome this problem we have developed
a model which will be highly effective. Regression Algorithms are used because they
provide us with continuous value as an output and not a categorized value. Because of
which it will be possible to predict the actual price a car rather than the price range of
a car. User Interface has also been developed which acquires input from any user and
displays the Price of a car according to user’s inputs.The results of this project can be
useful for car dealerships and private sellers to price their cars accurately and
optimize their profits. Future work can include expanding the dateset to include more
features such as the car's service history, accident history, and location, and exploring
other machine learning algorithms to improve the accuracy of our model.
Introduction
Determining whether the listed price of a used car is a challenging task, due to the
many factors that drive a used vehicle’s price on the market. The focus of this project
is developing machine learning models that can accurately predict the price of a used
car based on its features, in order to make informed purchases. We implement and
evaluate various learning methods on a dateset consisting of the sale prices of
different makes and models . We will compare the performance of various machine
learning algorithms like Linear Regression, Ridge Regression, Lasso Regression,
Elastic Net, Decision Tree Regressor and choose the best out of it. Depending on
various parameters we will determine the price of the car. Regression Algorithms are
used because they provide us with continuous value as an output and not a categorized
value because of which it will be possible to predict the actual price a car rather than
the price range of a car. User Interface has also been developed which acquires
input from any user and displays the Price of a car according to user’s inputs.
In this project, we will use a linear regression model to train our machine learning
algorithm to predict the selling price of old cars. Linear regression is a simple and
widely used machine learning algorithm that models the linear relationship between
the dependent variable (selling price) and one or more independent variables (car
model, year of manufacture, condition, mileage, etc.). By training our model on
historical data on old car sales, we can predict the selling price of new cars accurately.
The results of our model can be used by car dealerships and private sellers to price
their cars accurately and maximize their profits.
Problem statement
For the purposes of car valuation, popular guides tend not to use machine learning.
Instead, they source data from local sales and average the prices of many similar cars.
This method works well if you have a common car with a common set of features.
The condition of the car is judged very roughly, typically on a scale of one to three.
Cars that are “unusual” are therefore hard to evaluate. Effectively, no inferences are
drawn from similar cars but from a different make and model, whereas with machine
learning, the entirety of the dataset and its features are used to train the model
predictions. Using machine learning is a solution to the problem of utilization of all
the data and will assist in utilizing all the features of a car to make valuations.
New cars of a particular make, model, location, and feature selection are identical in
condition, function, and price. When new cars are sold for the first time they are then
classified as used cars. As an asset ages, its price changes because it declines in
efficiency in the current and in all future periods. Depreciation reflects the change in
net present value over time. Revaluation, on the other hand, is the change in value or
price of an asset that is caused by everything other than aging. This includes price
changes due to inflation, obsolescence, and any other change not associated with
aging [2]. Used cars are subject to depreciation and revaluation. Depreciation can be
used as an umbrella term for both of these, and the rest of this report will follow that
convention when referring to the loss of value over time. Revaluation plays a part in
the depreciation of cars based on the features that they have. Power hungry cars will
be less sought after when the price of gasoline is high, for example. A car with the
same make, model, year, and geographic region, but this a larger engine than a
different car should command a different value at different times.
In addition to the age of the car and the revaluation of its features, used cars have a
unique service history that develops over time. Parts will become worn with time and
miles driven (mileage). What is replaced, when it is replaced, and by whom, are all to
be considered as it relates to the current working condition of the car and its
desirability on the market. The particularities are difficult to account for in traditional
price-setting models, as it is a major differentiator in vehicles. Generally, it is
summarized in the “condition” of the car. The value of repairs or custom
modifications to the car are recognized only if they noticeably improve the overall
condition of the car.
Using machine learning to better utilize data on all the less common features of a car
can more accurately predict the value of a vehicle. This is a clear benefit to consumers,
especially those who themselves cannot ascertain the value of the vehicle that they are
buying or selling and must rely on a tool. A tool that is more tailored to the non-
standard features of the car can provide a more accurate price and make the market
fairer for all participants.
There are several machine learning regression models that can be applied to price
prediction. This work will investigate which one offers the best performance
according to several criteria. The nature of machine learning is to train on past data to
predict unseen data. Applied to price prediction of cars, the data is sourced from past
sales while the predictions are for the present value of cars. Therefore, a criterion for
the selection of a machine learning model it remains accurate in its predictions for
future years, not included in the data set.
Requirements
Hardware requirements
Operating system- Windows 7,8,10
Processor- dual core 2.4 GHz (i5 or i7 series Intel processor
or equivalent AMD)
RAM-4GB
Software Requirements
Python
Pycharm(Any other IDE)
PIP 2.7
Jupyter Notebook
Chrome
Theory:
This chapter will explain relevant theory and related work. This includes
concepts related to regression learning, all metrics used for the
performance measurement of the models, and related research in the
field of machine learning applied to price prediction.
Overfitting
Overfitting a model is a condition where a statistical model begins to
describe the random error in the data rather than the relationships
between variables. This condition can affect all supervised machine
learning models. In the case of regression models, overfitting can occur
when there many terms for the number of observations. This leads to
the regression coefficients representing the noise rather than the actual
relationships in the data. Much better prediction results on the training
data is an indication of overfitting.
Linear Regression
Ridge Regression
METHODOLGY
There are two primary phases in the system: 1. Training phase: The system is trained
by using the data in the data set and fits a model (line/curve) based on the algorithm
chosen accordingly. 2. Testing phase: the system is provided with the inputs and is
tested for its working. The accuracy is checked. And therefore, the data that is used to
train the model or test it, has to be appropriate. The system is
designed to detect and predict price of used car and hence appropriate algorithms
must be used to do the two different tasks.
Data Collection:
We collected data on old car sales from various sources such as online classifieds,
used car dealerships, and private sales. We used web scraping techniques to extract
data from online sources and manually collected data from other sources.
Data Preprocessing:
The collected data was cleaned and preprocessed to remove any missing or irrelevant
data. We also performed feature engineering to extract important features such as the
age of the car, mileage, and condition. We used pandas and numpy libraries to
perform data cleaning and feature engineering.
Model Training:
We used a linear regression model to train our machine learning algorithm to predict
the selling price of old cars. We split the data into training and testing sets to avoid
overfitting and used cross-validation techniques to improve the accuracy of our model.
We used scikit-learn library to train and evaluate our model.
Objective
CONCLUSION
The increased prices of new cars and the financial incapability of the customers to buy
them, Used Car sales are on a global increase. Therefore, there is an urgent need for a
Used Car Price Prediction system which effectively determines the worthiness of the
car using a variety of features. The proposed system will help to determine the
accurate price of used car price prediction.
REFRENCES
[3] Aarshay Jain, “A Complete Tutorial on Ridge and Lasso Regression in Python”,
January 28, 2016. [Online], Available:
https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso
regression-python-complete-tutorial/ [Accessed March 24, 2023]