Describe Machine Learning Lifecycle

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

1.

Describe machine learning lifecycle

Machine learning life cycle involves seven major steps, which are given below:

 Gathering Data
 Data preparation
 Data Wrangling
 Analyse Data
 Train the model
 Test the model
 Deployment

In the complete life cycle process, to solve a problem, we create a machine learning
system called "model", and this model is created by providing "training". But to train
a model, we need data, hence, life cycle starts by collecting data.

1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step
is to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected
from various sources such as files, database, internet, or mobile devices.
It is one of the most important steps of the life cycle.

2. Data preparation
After collecting the data, we need to prepare it for further steps.

Data preparation is a step where we put our data into a suitable place and prepare it to
use in our machine learning training.

In this step, first, we put all data together, and then randomize the ordering of data.

3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable
format.

It is the process of cleaning the data, selecting the variable to use, and transforming
the data in a proper format to make it more suitable for analysis in the next step.

It is one of the most important steps of the complete process. Cleaning of data is
required to address the quality issues.

4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step
involves:

 Selection of analytical techniques


 Building models
 Review the result

The aim of this step is to build a machine learning model to analyse the data using
various analytical techniques and review the outcome. It starts with the determination
of the type of the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the
model using prepared data, and evaluate the model.

5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns, rules, and,
features.

6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset
to it.

Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.

7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the model
in the real-world system.

If the above-prepared model is producing an accurate result as per our requirement


with acceptable speed, then we deploy the model in the real system.

But before deploying the project, we will check whether it is improving its
performance using available data or not. The deployment phase is similar to making
the final report for a project.

2. Differentiate between data analysis and data science.


Feature Data Science Data Analytics
Coding Language Primarily uses Python, Requires knowledge of
along with C++, Java Python and R.
Programming Skills Requires in-depth Basic programming
programming skills are sufficient.
knowledge.
Use of ML Utilizes machine Does not use machine
learning algorithms for learning for insights.
insights.

Data Type Deals mostly with Deals with structured


unstructured data. data.
Statistical Skills Requires strong Minimal or no need for
statistical skills. statistical skills.

Linear Regression Logistic Regression

Used to predict the continuous dependent Used to predict the categorical


variable using a given set of independent dependent variable using a given set of
variables. independent variables.

Used for solving classification


Used for solving regression problem.
problems.

Predict the value of continuous variables Predict values of categorical variables

In this we find best fit line. In this we find S-Curve.

Maximum likelihood estimation


Least square estimation method is used
method is used

The output must be continuous value, Output must be categorical value such
such as price, age, etc. as 0 or 1, Yes or no, etc.

It required linear relationship between


It not required linear relationship.
dependent and independent variables.

You might also like