Lastmyreport

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

CHAPTER 1 - INTRODUCTION

In our six-week summer training we learnt about basics of python in the first two
weeks, Then one-week NumPy, after that one-week pandas and last one week for machine
learning.

1.1 What is Python?

Python is a popular programming language. It was created by Guido van Rossum, and released
in 1991.

It is used for:

• web development (server-side),


• software development,
• mathematics,
• system scripting.

1.2 What can Python do?

• Python can be used on a server to create web applications.


• Python can be used alongside software to create workflows.
• Python can connect to database systems. It can also read and modify files.
• Python can be used to handle big data and perform complex mathematics.
• Python can be used for rapid prototyping, or for production-ready software
development.

1.3 Why Python?

• Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc.).
• Python has a simple syntax similar to the English language.

1|Page
• Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
• Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
• Python can be treated in a procedural way, an object-oriented way or a functional way.

1.4 LANGUAGE FEATURES

• Interpreted
• There are no separate compilation and execution steps like C and C++.
• Directly run the program from the source code.
• Internally, Python converts the source code into an intermediate form
called bytecodes which is then translated into native language of
specific computer to run it.
• No need to worry about linking and loading with libraries, etc.
• Platform Independent
• Python programs can be developed and executed on multiple operating
system platforms.
• Python can be used on Linux, Windows, Macintosh, Solaris and many
more.
• Free and Open Source; Redistributable
• High-level Language
• In Python, no need to take care about low-level details such as
managing the memory used by the program.
• Simple
• Closer to English language; Easy to Learn
• More emphasis on the solution to the problem rather than the syntax
• Embeddable
• Python can be used within C/C++ program to give scripting capabilities
for the program’s users.
• Robust:
• Exceptional handling features
• Memory management techniques in built
• Rich Library Support

2|Page
• The Python Standard Library is very vast.
• Known as the “batteries included” philosophy of Python; It can help
do various things involving regular expressions, documentation
generation, unit testing, threading, databases, web browsers, CGI,
email, XML, HTML, WAV files, cryptography, GUI and many more.
• Besides the standard library, there are various other high-quality
libraries such as the Python Imaging Library which is an amazingly
simple image manipulation library.

1.5 Software making use of Python


Python has been successfully embedded in a number of software products as a scripting
language.

1. GNU Debugger uses Python as a pretty printer to show complex structures such
as C++ containers.
2. Python has also been used in artificial intelligence
3. Python is often used for natural language processing tasks.

Current Applications of Python


1. A number of Linux distributions use installers written in Python example in
Ubuntu we have the Ubiquity
2. Python has seen extensive use in the information security industry, including in
exploit development.
3. Raspberry Pi– single board computer uses Python as its principal user-
programming language.
4. Python is now being used Game Development areas also.
Pros:
1. Ease of use
2. Multi-paradigm Approach
Cons:
1. Slow speed of execution compared to C, C++
2. Absence from mobile computing and browsers
3. For the C, C++ programmers switching to python can be irritating as the language
requires proper indentation of code. Certain variable names commonly used like
sum are functions in python. So, C, C++ programmers have to look out for these.
3|Page
Advantages:
1. Presence of third-party modules
2. Extensive support libraries (NumPy for numerical calculations, Pandas for data
analytics etc.)
3. Open source and community development
4. Versatile, Easy to read, learn and write
5. User-friendly data structures
6. High-level language
7. Dynamically typed language (No need to mention data type based on the value
assigned, it takes data type)
8. Object-oriented language
9. Portable and Interactive
10. Ideal for prototypes – provide more functionality with less coding
11. Highly Efficient (Python’s clean object-oriented design provides enhanced process
control, and the language is equipped with excellent text processing and integration
capabilities, as well as its own unit testing framework, which makes it more
efficient.)
12. (IoT)Internet of Things Opportunities
13. Interpreted Language
14. Portable across Operating systems

Applications:
1. GUI based desktop applications
2. Graphic design, image processing applications, Games, and Scientific/
computational Applications
3. Web frameworks and applications
4. Enterprise and Business applications
5. Operating Systems
6. Education
7. Database Access
8. Language Development
9. Prototyping
10. Software Development

4|Page
1.6 Organizations using Python:
1. Google (Components of Google spider and Search Engine)
2. Yahoo (Maps)
3. YouTube
4. Mozilla
5. Dropbox
6. Microsoft
7. Cisco
8. Spotify
9. Quora

1.7 Data types in Python:


Data types are the classification or categorization of data items. It represents the kind of
value that tells what operations can be performed on a particular data.

Boolean:
are either True or False.
Numbers:
can be integers (1 and 2), floats (1.1 and 1.2), fractions (1/2 and 2/3).
Strings:
are sequences of Unicode characters, e.g. an HTML document.
Lists:
are ordered sequences of values.
Tuples:
are ordered, immutable sequences of values.

5|Page
Sets :
are unordered bags of values.

1.8 Variables:

Variables are nothing but reserved memory locations to store values. This means that
when you create a variable, you reserve some space in memory.
Based on the data type of a variable, the interpreter allocates memory and decides what
can be stored in the reserved memory. Therefore, by assigning different data types to
variables, you can store integers, decimals or characters in these variables.
Ex: counter=100 #An Integer assignment
miles = 1000.0 #A floating point
name = "John" #A string

1.9 Python Operator :

• Arithmetic Operator

Table 1.1 Arithmetic Operators

6|Page
Comparison Operator

Table 1.2 Comparison Operators

7|Page
• Logical Operator

table1.3 logical operators

1.10 LOOPS :

Programming languages provide various control structures that allow for more complicated
execution paths.
A loop statement allows us to execute a statement or group of statements
multiple times.
Python programming language provides following types of loops to handle
looping requirements.

8|Page
Table. 1.4 Loop types

S no. Types of loops

while loop
1. Repeats a statement or group of statements while a given condition is
TRUE. It tests the condition before executing the loop body.

for loop
Executes a sequence of statements multiple times and abbreviates the code
2. that manages the loop variable.

nested loops
You can use one or more loop inside any another while, for or do..while
3. loop.

1.11 Conditional Statements:

sssDecision making is anticipation of conditions occurring while execution of the program


and specifying actions taken according to the conditions. Decision structures evaluate
multiple expressions which produce TRUE or FALSE as outcome. You need to
Determine which action to take and which statements to execute if outcome is TRUE or
FALSE otherwise.

9|Page
Table 1.5 types of conditional statements
Sr.No. Statement & Description

1 if statements

An if statement consists of a boolean expression followed by one or more


statements.

2 if...else statements

An if statement can be followed by an optional else statement, which


executes when the Boolean expression is FALSE.

3 nested if statements

You can use one if or else if statement inside another if or else


if statement(s).

1.12 What is NumPy?

NumPy is a Python library used for working with arrays.

It also has functions for working in domain of linear algebra, Fourier transform, and matrices.

NumPy was created in 2005 by Travis Oliphant. It is an open-source project and you can use it
freely.

NumPy stands for Numerical Python.

1.13 Why Use NumPy?

In Python we have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

10 | P a g e
The array object in NumPy is called ndarray, it provides a lot of supporting functions that make
working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very important.

1.14 What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analysing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.

1.15 Why Use Pandas?

Pandas allows us to analyse big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

1.16 Function:
Function blocks begin with the keyword “def” followed by the function name and
parentheses ( ( ) ).
Any input parameters or arguments should be placed within these parentheses.
You can also define parameters inside these parentheses. The first statement of a function can be
an optional statement -the documentation string of the function.
The code block within every function starts with a colon (:) and is indented. The statement return
[expression] exits a function, optionally passing back an expression to
the caller. A return statement with no arguments is the same as return None.

Syntax:
def function_name(parameters):
“ “ “docstring “ “ “

11 | P a g e
Statement(s)

Example:
def greet(name):
"""
This function greets to
the person passed in as
a parameter
"""
print ("Hello, " + name + ". Good morning!")

# Now you can call greed function


greet(‘Rahul’)

1.17 Introduction to Machine Learning

Figure 1.1 – Machine learning

Machine Learning algorithms enable the computers to learn from data, and even improve
themselves, without being explicitly programmed.

Machine learning (ML) is a category of an algorithm that allows software applications to


become more accurate in predicting outcomes without being explicitly programmed. The basic

12 | P a g e
premise of machine learning is to build algorithms that can receive input data and use statistical
analysis to predict an output while updating outputs as new data becomes available.

Types of Machine Learning?


Machine learning can be classified into 3 types of algorithms.

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

Overview of Supervised Learning Algorithm


In Supervised learning, an AI system is presented with data which is labelled, which means that
each data tagged with the correct label.

The goal is to approximate the mapping function so well that when you have new input data (x)
that you can predict the output variables (Y) for that data.

Figure 1.2 – Machine learning explanation

13 | P a g e
As shown in the above example, we have initially taken some data and marked them as ‘Spam’
or ‘Not Spam’. This labelled data is used by the training supervised model, this data is used to
train the model.

Once it is trained, we can test our model by testing it with some test new mails and checking of
the model is able to predict the right output.

Types of Supervised learning


• Classification: A classification problem is when the output variable is a category,
such as “red” or “blue” or “disease” and “no disease”.

• Regression: A regression problem is when the output variable is a real value, such
as “dollars” or “weight
Example of Supervised Learning Algorithms:
• Linear Regression
• Logistic Regression
• Nearest Neighbour
• Gaussian Naive Bayes
• Decision Trees
• Support Vector Machine (SVM)
• Random Forest

Unsupervised Learning Algorithm


In unsupervised learning, an AI system is presented with unlabelled, uncategorized data and the
system’s algorithms act on the data without prior training. The output is dependent upon the
coded algorithms. Subjecting a system to unsupervised learning is one way of testing AI.

14 | P a g e
Figure 1.3 – Unsupervised Learning

In the above example, we have given some characters to our model which are ‘Ducks’ and ‘Not
Ducks’. In our training data, we don’t provide any label to the corresponding data. The
unsupervised model is able to separate both the characters by looking at the type of data and
models the underlying structure or distribution in the data in order to learn more about it.

Types of Unsupervised learning


• Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behaviour.

• Association: An association rule learning problem is where you want to discover


rules that describe large portions of your data, such as people that buy X also tend to
buy Y

1.18 What is user interface(UI)?


The user interface (UI) is the point of human-computer interaction and communication in a
device. This can include display screens, keyboards, a mouse and the appearance of a desktop. It
is also the way through which a user interacts with an application or a website.

For this project we use Flask as our user interface.

What is Flask?

Flask is a web application framework written in Python. It was developed by Armin


Ronacher, who led a team of international Python enthusiasts called Poocco. Flask is based on
the Werkzeg WSGI toolkit and the Jinja2 template engine. Both are Pocco projects.

Why is Flask a good web framework choice?


15 | P a g e
Unlike the Django framework, Flask is very Pythonic. It’s easy to get started with Flask,
because it doesn’t have a huge learning curve.

Structuring a Flask app

In order to build a Flask app, you’ll need the following minimal directory structure:

project
├── templates
└── app.py

We write our Flask app into app.py. In the templates/ directory, we store the HTML templates
that our Flask app will use to display to the end user.

16 | P a g e
CHAPTER 2 - TRAINING WORK UNDERTAKEN

2.1 Introduction
In our project we decided to predict the price of used cars using data science and machine
learning that we learned during our six-week summer training.

2.2 Data cleaning

2.2.1 STEP 1:

We import pandas, matplotlib, seaborn library available in python.


From sklearn module we import train_test_split to split the data for training and testing, Linear
Regression algorithm for fitting the model, metrics for calculating the error in our model.

17 | P a g e
Figure 2. 1 – Importing libraries

2.2.2 Step 2:

pd.read_csv is used to read the “car data.csv” dataset and stored it in the car_dataset.
This dataset contains information about used cars listed on different websites
This data can be used for a lot of purposes such as price prediction to exemplify the use of
linear regression in Machine Learning.
The columns in the given dataset are as follows:
1. Car_Name (This column should be filled with the name of the car.)
2. Year (This column should be filled with the year in which the car was bought.)
3. Selling_Price (This column should be filled with the price the owner wants to sell the
car at.)
4. Present_Price (This is the current ex-showroom price of the car.)
5. Kms_Driven (This is the distance completed by the car in km.)
6. Fuel_Type (Fuel type of the car.)
7. Seller_Type (Defines whether the seller is a dealer or an individual.)
8. Transmission (Defines whether the car is manual or automatic
9. Owner (Defines the number of owners the car has previously had.)

18 | P a g e
Car_dataset.head() – It shows the first five rows of the car dataset.

Figure 2. 2 – Dataset.head()

2.2.3 Step 3:
Car_dataset.shape is used for checking the number of rows and columns in the car dataset.
From output (301,9) we observe that our dataset contains 301 rows and 9 columns.

Similarly, car_dataset.info() function is used for getting information about the dataset.

Like it talks about the count of non-null values in each column of the dataset and also tells
about the datatype of each column i.e. whether it is float64, int64 and object type.

In our car dataset we have zero null value in any column and we have two columns with float64
types values i.e. (Selling_Price and Present_Price). Three columns with int64 datatype i.e.
(Year, Kms_Driven, Owner).

Four columns with object datatype i.e(Car_Name, Fuel_Type, Seller_Type, Transmission)

19 | P a g e
Figure 2. 3 – Checking row, columns and dataset info

2.2.4 Step 4:

The describe() method returns description of the data in the Data Frame.

If the Data Frame contains numerical data, the description contains this information for each
column:

count - The number of not-empty values.


mean - The average (mean) value.
std - The standard deviation.
min - the minimum value.
25% - The 25% percentile*.
50% - The 50% percentile*.
20 | P a g e
75% - The 75% percentile*.
max - the maximum value.

The describe() method returns description of the data in the Data Frame.

Figure 2. 4 – Using describe()

In our dataset we have five columns with numerical data i.e(Year, selling_Price, Present_Price,
Kms_Driven and Owner)
Count of non-null values in our dataset is 301 for each above column which is equal to number
of rows in our dataset i.e., 301.it means above columns doesn’t have any null value.

Year column has a mean of 2013.627907, Selling_Price has a mean of 4.661296,


Present_Price has a mean of 7.628472, Kms_Driven has a mean of 36947.205980 and Owner
has a mean of 0.043189.

-Year column has a standard deviation of 2.891554 Selling_Price has 5.082812,


Present_Price has 8.644115, Kms_Driven has 38886.883882 and Owner has 0.247915.

-The minimum value in the Year column Zis 2003.000000, in Selling_Price is 0.100000,
In Present_Price column is 0.320000, in Kms_Driven is 500.000000and in Owner is 0.000000.

21 | P a g e
The maximum value in the Year column is 2018.000000, in Selling_Price is 35.000000,
In Present_Price column is 92.600000, in Kms_Driven is 500000.000000and in Owner is
3.000000.

2.2.5 Step 5:
car_dataset.isnull().sum() is used to count the null values in each column.

Figure 2. 5 – Checking for null values

From above output we came to know that in our dataset we don’t have any column that contain
null value.

2.3 Visualizing Categorical data


First, we decide to visualize our data with the help of various graphs
We have four object type columns i.e. (Car_Name, Fuel_Type, Seller_Type, Transmission)

2.3.1 Visualizing Fuel_Type –

seaborn.countplot() method is used to Show the counts of observations in each categorical


bin using bars.

22 | P a g e
Figure 2.6 – Plotting fuel type categorical data

Figure 2.7 – Output of the above code

As we can infer from the graph most of the cars in the second-hand market uses petrol as their
fuel with few using diesel and only a handful using CNG.

23 | P a g e
2.3.2 Visualizing seller type-

Figure 2.8 Plotting Seller type categorical data

This graph infers that number of second-hand cars sold by individual are less as compared to
dealers

24 | P a g e
2.3.3 Visualizing Transmission_type –

Figure 2.9 - Plotting Vehicles distribution on basis of Transmission type

We can infer from this graph that most of the car in the market our of manual transmission.

25 | P a g e
2.4 Visualizing Numerical Data-

2.4.1 Visualizing Owner_type –

Figure 2.10 – Plotting number of previous owner

From this graph, we infer that many cars have zero owner previously.
Very few has one owner.
Very limited has 3 owners previously.

2.4.2 Visualizing selling_price -

A Box Plot is also known as Whisker plot is created to display the summary of the set of
data values having properties like minimum, first quartile, median, third quartile and
maximum. In the box plot, a box is created from the first quartile to the third quartile, a
vertical line is also there which goes through the box at the median. Here x-axis Denotes the
data to be plotted while the y-axis shows the frequency distribution.

26 | P a g e
Figure 2.11 – Box plot of selling price (in lakhs)

From above graph we infer that most of the cars has a max selling price of 13 lakh and min
selling price of 1 lakh. And median selling price of 4 lakh.
Selling price has many outliers.

27 | P a g e
2.4.3 Visualizing present price-

Figure 2.12 - Box plot of present price (in lakhs)

From above graph we infer that most of the cars has a max present price of 21 lakh and min
selling price of 1 lakh. And median present price of 9 lakh.

It also shows that present price has very less outliers.

28 | P a g e
2.4.4 Visualizing Kms_Driven by a car-

Figure 2.13 – Box plot of km driven

From above graph we infer that most of the car has max Kms Driven of 90000kms and
minimum Kms Driven of 500 kms. Median kms driven are 32000. Kms Driven boxplot also
contain outliers.

Figure 2.14 – Graph of km driven

From above graph we infer that maximum cars are 45000 kms driven.
29 | P a g e
2.4.5 Visualizing Year in which the new car was bought

Figure 2.15 – Years in which the new car was bought

This graph shows the year in which the cars were bought from the showroom as we can see
most cars were bought in the year 2015 (i.e. recently) and less number of cars were bought in
the years like 2004,2007,2003,2005,2006,2009,2008 etc.

2.5 Check for the distribution of categorical Data:

Pandas Series.value_counts() function returns a Series containing counts of unique values.


The resulting object will be in descending order so that the first element is the most
frequently-occurring element. Excludes NA values by default.

30 | P a g e
Figure 2 .16- Checking for the distribution of categorical Data

From above output we observe that in the Fuel_Type column ,maximum cars use Petrol with
the count of 239,diesel car are 60 and CNG are 2 in number.
In seller_Type column, Dealer are195 in number and Individual are 106 in number.

In Transmission column, we observe that cars with manual gear are 261 in number, Automatic
cars are 40 in number.

2.6 Encoding

To use the categorical data in our model for prediction of car price we had to transform it to
numerical data.
So, we convert petrol to 0 ,diesel to 1 and CNG to 2 in our Fuel_Type column.
In Seller_Type column, we convert Dealer to 0 and Individual to 1.
In Transmission column, we convert Manual to 0 and Automatic to 1.

31 | P a g e
Figure 2.17 – Encoding of categorical columns

After Encoding we visualize the first five rows of our dataset again.

Figure 2 .18 – Visualization after encoding

After encoding and visualization our data was ready for testing and training.

2.7 Testing and Training

Let us now store the data and target value into two separate variables.
X contain all columns except Car_Name and Selling Price.
We drop Car_Name column because it is not useful feature for predicting car price.

Y contain only one column i.e. Selling_Price

Figure 2.19 – Splitting the data and the target


32 | P a g e
Figure 2.20 – Visualizing X and Y

33 | P a g e
2.7.1 Train-Test Split Evaluation

The train-test split is a technique for evaluating the performance of a machine learning
algorithm.

It can be used for classification or regression problems and can be used for any supervised
learning algorithm.

The procedure involves taking a dataset and dividing it into two subsets. The first subset is used
to fit the model and is referred to as the training dataset. The second subset is not used to train
the model; instead, the input element of the dataset is provided to the model, then predictions
are made and compared to the expected values. This second dataset is referred to as the test
dataset.

• Train Dataset: Used to fit the machine learning model.


• Test Dataset: Used to evaluate the fit machine learning model.
The objective is to estimate the performance of the machine learning model on new data: data
not used to train the model.

This is how we expect to use the model in practice. Namely, to fit it on available data with
known inputs and outputs, then make predictions on new examples in the future where we do
not have the expected output or target values.

The train-test procedure is appropriate when there is a sufficiently large dataset available.

Figure 2.21 – Splitting training and test data

In above we use 10 % data for testing and rest 90% data for training.

2.8 Model Training

After train_test_split we apply machine learning algorithm for fitting our model. We apply
34 | P a g e
three algorithm i.e.-
1.Linear Regression
2. Random Forest Regression
3. Decision Tree Regressor

2.8.1 Linear Regression


Linear Regression is a machine learning algorithm based on supervised learning. It
performs a regression task. Regression models a target prediction value based on
independent variables.

Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x). So, this regression technique finds out a linear relationship between
x (input) and y(output). Hence, the name is Linear Regression.

Figure 2.22 – Creating an object of linear regression model

35 | P a g e
Then we fit the model with X_train and Y_train data.

For model evaluation ,we use predict method to do the prediction on the X_test data
For calculating the error we use R square error method.
R-squared is a statistical measure that represents the goodness of fit of a regression model.
The ideal value for r-square is 1. The closer the value of r-square to 1, the better is the model
fitted.
R-square is a comparison of the residual sum of squares (SSres) with the total sum of squares
(SStot).

Where,
The total sum of squares is calculated by summation of squares of perpendicular distance
between data points and the average line.

The residual sum of squares is calculated by the summation of squares of perpendicular


distance between data points and the best-fitted line.

In above fig the calculated error is 0.83 which is absolutely good.


2.8.2 Random Forest Regression

Random Forest Regression is a supervised learning algorithm that uses ensemble


learning method for regression. Ensemble learning method is a technique that combines
predictions from multiple machine learning algorithms to make a more accurate prediction than
a single model.

36 | P a g e
Figure 2.23 – Random Forest diagram

Figure 2.24-Importing Random Forest Regressor from ensemble class and creates object of it

i.e. Regressor with 20 decision tree.


We fit regressor object with X_train and Y_train data . Do prediction on X_test data.
We import metrics, numpy library and then we calculate Mean Squared Error and Root Mean

37 | P a g e
Squared Error for checking the accuracy of our model.
The Mean Squared Error (MSE) or Mean Squared Deviation (MSD) of an estimator
measures the average of error squares i.e. the average squared difference between the
estimated values and true value.

where N is the number of data points, y(i) is the i-th measurement, and y ̂(i) is its corresponding
prediction.

In above fig the root mean square error is very less which is 0.54.so,it is fitted better i.e. The
lower the RMSE, the better a given model is able to “fit” a dataset.

2.8.3 Decision Tree Regression:

Decision Tree is the most powerful and popular tool for classification and prediction. A
Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and each leaf node (terminal node)
holds a class label.

38 | P a g e
Figure 2.25 – Decision tree diagram

Figure 2.26 importing decision tree regressor from tree class and creates object of it

We fit regressor2 object with X_train and Y_train data . Do prediction on X_test data.
We calculate Mean Squared Error and Root Mean Squared Error between actual Y_test value
and predicted y value for checking the accuracy of our model.
We observe that the root mean square error in decision tree regressor is more than random
forest regressor model which is 0.86.

So, we decided to apply Random Forest in our model for better accuracy.
We use the built-in RandomForestRegressor() class from sklearn to build our regression model.
39 | P a g e
The following code helps us save our model using the Pickle module. Our ML model is saved
as “model.pkl”. We will later use this file to predict the output when new input data is provided
from our web-app.

Pickle: Python pickle module is used for serializing and de-serializing


python object structures. The process to convert any kind of python
object (list, dict, etc.) into byte streams (0s and 1s) is called pickling or

serialization or flattening or marshalling. We can convert the byte


stream (generated through pickling) back into python objects by a
process called as unpickling.

Figure 2.27 importing pickle and dump the model.pkl file

2.Develop your web application with Flask and integrate


your model:
2.1Import necessary libraries, initialize the flask app, and load our ML model:

We will initialize our app and then load the “model.pkl” file to the app.

Figure 2.28 Initialize the flask app

40 | P a g e
2.3. Define the app route for the default page of the web-app:
Routes refer to URL patterns of an app (such as myapp.com/home or
myapp.com/about). @app.route("/") is a Python decorator that Flask provides to
assign URLs in
our app to functions easily.

Figure 2.29 creating default page of our web page

The decorator is telling our @app that whenever a user visits our app domain (localhost:5000
for
local servers) at the given .route(), execute the home() function. Flask uses the Jinja template
library to render templates. In our application, we will use templates to render HTML which
will
display in the browser.
2.4. Redirecting the API to predict the Car price :
We create a new app route (‘/predict’) that reads the input from our ‘index.html’ form and on
clicking the predict button, outputs the result using render_template.

Figure 2.30 creating predict function to use the predict button in our web-app

Let’s have a look at our index.html file :

41 | P a g e
Figure 2.31 Index.html file

2.5. Starting the Flask Server :

Figure 2.32 Starting the flask server

app.run() is called and the web-application is hosted locally on [localhost:5000].


“debug=True” makes sure that we don’t require to run our app every time we make changes,
we can simply refresh our web page to see the changes while the server is still running.

42 | P a g e
CHAPTER 3: RESULT AND DISCUSSION

Figure 3. 1 – Project Structure

The project is saved in a folder called “myflask”. We first run the ‘mml.py’ file to get
our ML model and then we run the ‘app.py’. On running this file, our application is hosted on
the
local server at port 5000.
You can simply type “localhost:5000″ on your web browser to open
your web-application after running ‘app.py’
• car data.csv — This is the dataset we used
• mml.py — This is our machine learning code
• model.pkl — This is the file we obtain after we run the mlmodel.py file. It is present in
the same directory

43 | P a g e
• app.py — This is the Flask application we created above
• templates — This folder contains our ‘index.html’ file. This is mandatory in Flask while
rendering templates. All HTML files are placed under this folder.
• static — This folder contains the “css” folder. The static folder in Flask application is
meant to hold the CSS and JavaScript files.
It is always a good idea to run your application first in the local server and check the
functionality of your application before hosting it online in a cloud platform. Let’s see
what happens when we run ‘app.py’ :

Figure 3. 2 – window powershell

On clicking on the provided URL we get our website:

44 | P a g e
Figure 3. 3 chrome web page showing the result

Now, let’s enter the required values and click on the “Predict” button and see what happens.

45 | P a g e
Figure 3. 4 Entering the required values

Figure 3.5 showing the predicted car price on the screen.

46 | P a g e
Observe the URL (127.0.0.1:5000/predict), this is the use of app routes. On clicking the
“Predict” button we are taken to the predict page where the predict function renders the
‘index.html’ page with the output of our application.

47 | P a g e
CHAPTER 4 : CONCLUSION AND FUTURE SCOPE

4.1 Conclusion

Here, I have come to the end of the project on car price prediction using machine learning.
The purpose of this project is to enhance our technical skills.

I would like to share my experience while doing this project. I learnt many new things about
the different libraries of python like pandas, numpy and sklearn, we also came to know how to
visualize the dataset, how to handle null values ,duplicates values and categorical data.
Implementation of various Machine learning algorithm and how to fit these models and after
that predicting the price of car. I also learnt how to use Flask, an API of Python that allows us
to build up web-applications. Thus it was a wonderful learning experience for me while
working on this project.

This project has developed my thinking skills and more interest in the field of data science
and machine learning with python. This project gave me real insight into the world of data
science.

A very special thanks to our HOD sir for giving us two months to work on our technical
skills.

Thank You

48 | P a g e
4.2 Future Scope
Scope of Python: Python programming language, to be the most promising career in
technologies, industry. Opportunities in the career of python are increasing tremendously in the
world. Since Python has simple codes, faster readability capacity, significant companies are in
demand in python language. Python to be an excellent tool in designing progressive ideas.
Candidates interested in python increases every day.
Today, companies both in India, our lookout for a skilled python developer for their companies.
Knowing python language gives a competitive advantage when compared to other words.
Indian IT companies established around 2 lakh jobs in 2018, still expecting more developers in
python for their company. Python language becomes more trending since it is implemented in
upcoming technologies such as artificial intelligence, machine learning.

Job Roles For Python Developers

Scope of python trending in the fields of data science, analyst. Job roles advanced in python
with high promising pay in large companies.

• Research Analyst
• DevOps Engineer
• Python developer
• Data Analyst
• Software developer

The scope of Machine Learning is not limited to the investment sector. Rather, it is expanding
across all fields such as banking and finance, information technology, media & entertainment,
gaming, and the automotive industry.

Machine Learning Job Scope and Salary Trends

The scope of Machine Learning in India, as well as in other parts of the world, is high in
comparison to other career fields when it comes to job opportunities. According to Gartner,
there will be 2.3 million jobs in the field of Artificial Intelligence and Machine Learning by
2022. Also, the salary of a Machine Learning Engineer is much higher than the salaries
offered to other job profiles.

49 | P a g e
REFERENCES

Online Sources

[ 1] w3school Python Tutorial [ online] Available: https://www.w3schools.com/python/

[2] Javatpoint Tutorial[Online]. Available: https://www.javatpoint.com/machine-learning

Books

[1] Mark Lutz, Learning Python, 5th Edition.

[2] Al Sweigart, Automate the Boring Stuff with Python, 2nd Edition.

[3] Charles Severance, Python for Everybody: Exploring Data in Python 3.

[ 4] Eric Matthess, Python Crash Course: A Hands-On, Project-Based Introduction to


Programming (2nd Edition). San Francisco, United States: No Starch Press,2019
[ 5] Guido van Rossum ,An Introduction to Python the Netherlands: Network Theory
Ltd, 2006.

50 | P a g e

You might also like