Module2 ch2

Introduction to Machine
Learning
Module 2-Chapter 2
Need of Machine Learning
• Business organization use huge data for their daily activities
Need to analyse data to take decisions.
• No proper utilization of data due to:
• Data scattered across different systems difficult to integrate.
• Lack of awareness of software tools to extract information from data.
Popular for three reasons:

1. High volume of available data to manage.
2. Cost of storage reduced- Easy to process,transmit distribute and extract data.
3. Availability of complex algorithms Ex: Deep learning
Terminologies of Machine Learning
What is Machine Learning?
• “Machine learning is a field of study that gives the computers to learn without
being explicitly programmed”
• Sub branch of AI
Conventional Prog v/s AIML
• After understanding problem, • Solution is formulated as rules
algorithm is formulated and called logic to programmed as
programmed Expert systems
• Difficult to many real-world • Developed based on Expert’s
problems like knowledge into programs.
puzzles,games,intelligent systems • Ex: mycin expert systems
etc. developed based on doctor’s expert
knowledge.
• However impractical depending on
human expertise so ML takes
datasets as input and take decisions
by own.
What is Model?
A model can be any one of the following –
1. Mathematical equation
2. Relational diagrams like graphs/trees
3. Logical if/else rules
4. Groupings called clusters

Another View of Machine Learning-Tom
Michell’s Definition
Gaining Knowledge from experience
1. Collection of data
2. Develop abstract concepts(ideas) out of gathered data.
3. Generalization converts abstraction into actionable intelligence; rank
concepts-draw inference-set heuristics Ex: Choosing best hotels in
new city.
4. Evaluation of heuristics if failed, checks thoroughness of models
Machine Learning related to other fields
1. Artificial intelligence
Machine Learning, Data Science, Data Mining, and Data
Analytics
Data science is an “umbrella term” covering from data collection to data
analysis.
• Big Data: Data science concerns about collection of data. Big data is a
field of data science that deals with data’s following characteristics:
1. Volume: Huge amount of data is generated by big companies like
Facebook, Twitter, YouTube.
2. Variety: Data is available in variety of forms like images, videos,
and in different formats.
3. Velocity: It refers to the speed at which the data is generated and
processed.
• Data mining:Aims to extract the hidden patterns that are present in

the data, whereas, machine learning aims to use it for prediction.
• Data Analytics: Another branch of data science is data analytics. It
aims to extract useful knowledge from crude data. ML algorithms here
used in analysis
• Pattern Recognition:It uses machine learning algorithms to extract
the features for pattern analysis and pattern classification.
ML and Statistics
• Statistics is a branch of mathematics that has a solid theoretical
foundation regarding statistical learning. Like machine learning (ML),
it can learn from data.
• But the difference between statistics and ML is that statistical methods
look for regularity in data called patterns. Initially, statistics sets a
hypothesis and performs experiments to verify and validate the
hypothesis in order to find relationships among data.ML algorithms
makes accurate predictions to extract patterns.
Types of Machine Learning
Labelled Data
• Similar to key attribute in table
• Label is the feature that we aim to predict
• Dataset need not be a numbers it can be images also

Unlabelled data
In unlabelled data, there are no labels in the dataset.

1. Supervised learning
• Similar to teacher(Supervisor) -student based learning.
• Uses labelled data set.
• A supervisor provides labelled data so that the model is constructed
and generates test data.
• Two stages as per Layman terms
1. Teacher provides information to student who need to understand it. But
teacher has no knowledge whether student grasps it or not.
2. Teacher assess the student to test and evaluate.
Two methods of Supervised Learning
1. Classification:
• The input attributes of the classification algorithms are called independent
variables.
• The target attribute is called label or dependent variable.
• The relationship between the input and target variable is represented in
the form of a structure which is called a classification model. So, the
focus of classification is to predict the ‘label’ that is in a discrete form (a
value from the set of finite values)
• An example is shown in Figure where a classification algorithm takes a
set of labelled data images such as dogs and cats to construct a model that
can later be used to classify an unknown test image data.
Two stages of learning in classification
• Training Stage: Labelled dataset is given to algorithm and starts
learning. Later model is generated.
• Testing stage: Model is tested with unknown sample and label is
assigned.
• This process is classification.
• Some of the key algorithms of classification are: Decision Tree,
Random Forest , Support Vector Machines , Naïve Bayes and
Artificial Neural Network and Deep Learning networks like CNN
Regression model
• The regression model takes
input x and generates a
model in the form of a fitted
line of the form y = f(x).
• Here, x is the independent
variable that may be one or
more attributes and y is the
dependent variable.
Prediction in Regression Model
• linear regression takes the training set and tries to fit it with a line –
product sales = 0.66 × Week + 0.54. Here, 0.66 and 0.54 are all
regression coefficients that are learnt from data. The advantage of this
model is that prediction for product sales (y) can be made for unknown
week data (x). For example, the prediction for unknown eighth week can
be made by substituting x as 8 in that regression formula to get y
• Regression algorithms are used.
• The main difference is that regression models predict continuous
variables such as product price, while classification concentrates on
assigning labels such as class
2. Unsupervised Learning
• The process of self-instruction is based on the concept of trial and
error without supervisor.
• Unlabelled dataset is supplied.
• Algorithms observes past examples and recognizes patterns based on
grouping of objects.
• Cluster analysis and Dimensional reduction algorithms are examples
of unsupervised algorithms.
Cluster Analysis
• Aims to group objects into disjoint clusters or groups.
• Cluster analysis clusters objects based on its attributes. All the data
objects of the partitions are similar in some aspect and vary from the
data objects in the other partitions significantly.
• Some of the examples of clustering processes are — segmentation of a
region of interest in an image, detection of abnormal growth in a
medical image, and determining clusters of signatures in a gene
database.
• Some of the key clustering algorithms are:
• k-means algorithm
• Hierarchical algorithms
Dimensionality Reduction
• Dimensionality reduction algorithms are examples of unsupervised
algorithms.
• It takes a higher dimension data as input and outputs the data in lower
dimension by taking advantage of the variance of the data.
• It is a task of reducing the dataset with few features without losing the
generality.Ex: Image Compression
Semi supervised Learning
• Used when dataset has huge load of unlabelled data and some labelled
data.
• Labelling is time taking process.
• Semi-supervised algorithms use unlabelled data by assigning a
pseudo-label. Then, the labelled and pseudo-labelled dataset can be
combined
Reinforcement Learning
• Mimics human being
• An agent like robot or program perceive world and takes actions
• The aim is to reach goal or earn reward. In turn reward enable agent to
gain experience.
• The reward can be positive or negative (Punishment). When the
rewards are more, the behavior gets reinforced and learning becomes
possible.
A grid game
• No data supplied.
• Take actions L,R,T,B.
• Interacts with environment.
• Algorithm should construct model by
find best path out of many paths.
• This is experience to be modelled.
• Therefore, reinforcement algorithms are
reward-based, goal-oriented algorithms.
Challenges of Machine Learning
• Computers are better than humans like in computations.Likewise
human being better than machines in recognitions.
• But Deep learning systems challenges Humans ,can recognise human
faces in a second.
• The challenge can be quality of data for Quality model construction.
List of challenges
1. Problems: Can solve well posed problems-Clear in its ideas.
Whether this model for multiplication?
Puzzles games scientific computation has many “ill posed” problems

2. Huge data- need of quality data must be huge – no missing or
incorrect data.
3. High computation power: ML algorithms need high computation
power because problem is complex and need GPU or TPU.
4.Complexity of algorithms: Design select and evaluate optimal
algorithms is challenging.
5.Bias/Variance: variance is error in model –Bias Variance tradeoff
A model fits for training data correctly but fails for test data loses
generalization called overfitting. Underfitting is reverse case .
Both are challenging.
Machine Learning Process
Applications of Machine Learning
1. Sentiment Analysis: For movie reviews or product reviews, five
stars or one star are automatically attached using sentiment analysis
programs using NLP
2. Recommendation of systems.
3. Voice assistants
4. Technologies like Google maps,Uber uses ML algorithms

Module2 ch2

Uploaded by

Copyright:

Available Formats

Module2 ch2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module2 ch2

Uploaded by

Copyright:

Available Formats

Introduction to Machine

Popular for three reasons:

A model can be any one of the following –

2. Relational diagrams like graphs/trees

3. Logical if/else rules

4. Groupings called clusters

• Data mining:Aims to extract the hidden patterns that are present in

• Dataset need not be a numbers it can be images also

In unlabelled data, there are no labels in the dataset.

Puzzles games scientific computation has many “ill posed” problems

You might also like