This project aims to build a machine learning model to predict car prices based on various features such as brand, model, fuel type, mileage, seats, and owner type. The model uses linear regression for the prediction task.
Ensure you have the #requirements.txt libraries installed
- pandas: For data manipulation and analysis.
- numpy: For numerical operations.
- matplotlib: For plotting and visualization.
- seaborn: For enhanced data visualization.
- scikit-learn: For machine learning model building and evaluation.
data/
: Directory to store datasets.notebooks/
: Jupyter notebooks for exploratory data analysis and model building.scripts/
: Python scripts for data processing, model training, and evaluation.README.md
: Project documentation.
The dataset used in this project is https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho
Before training the model, the dataset needs to be preprocessed. This includes dropping unnecessary columns and encoding categorical variables.
Unnecessary columns such as 'owner' can be dropped:
data = data.drop(columns=['owner'])
Convert categorical variables into numerical format using one-hot encoding:
Split the data into training and testing sets, and then train a linear regression model
Evaluate the model
Visualize the data and the model's performance using matplotlib and seaborn
This project demonstrates how to build and evaluate a linear regression model to predict car prices. The evaluation metrics used R-squared score. Visualization techniques help in understanding the relationships between different features and the target variable.
This project is licensed under the MIT License.