Unit-1
Unit-1
Unit-1
➢Text Book
❖ G Amit Kumar Das, SaikatDutt, Subramanian Chandramouli, Machine
Learning, Pearson India Education Services, 2019.
➢Reference Books
1. Tom M. Mitchell, Machine Learning, McGraw Hill Education, 2013.
2. Rudolph Russell, Machine Learning: Step-by-Step Guide to Implement
Machine Learning Algorithms with Python, Create Space Independent
Publishing Platform, 2018.
1
A8703-Machine Learning : Course Outcomes
➢ Select modelling and evaluation technique for handling real time data.
Bayesian Concept Learning: Introduction, Bayes’ Theorem, Naïve Bayes Classifier, Ap-
plications of Naïve Bayes Classifier, Supervised Learning: Classification , Example of
Supervised Learning, Classification Model Learning Steps, Common Classification Algorithms:
KNN, Decision Tree, Random forest model , Support vector machines. Introduction of
Regression: Example of Regression, linear Regression, Multiple linear Regression. 3
A8703-Machine Learning : Syllabus
4
Introduction to Machine Learning
• Introduction
• Definition of ML
Traditional Programming refers to any manually created program that uses input
data and runs on a computer to produce the output.
In Machine Learning, also known as augmented analytics, the input data and
output are fed to an algorithm to create a program. This yields powerful
insights that can be used to predict future outcomes.
Components of Learning System
10
Introduction to Machine Learning
• Advantages of ML
2. Prediction and Decision Making - analyze large datasets to make predictions and
decisions with high accuracy
3. Scalability: - can handle large volumes of data and scale efficiently to accommodate
growing datasets
4. Adaptability: - can adapt and learn from new data, allowing them to continuously improve.
6. Pattern Recognition: -identifying patterns, trends, and anomalies in data that may not be
obvious to humans.
Introduction to Machine Learning
• Disadvantages of ML
1. Data Dependency - models heavily rely on the quality and quantity of data for training.
2. Overfitting: models may become too specialized to the training data and fail to generalize well to
unseen data.
3. Interpretability: lack interpretability, making it challenging to understand how they arrive at their
predictions
5. Ethical and Privacy Concerns: or infringe on privacy rights, raising ethical and social concerns.
6. Lack of Domain Knowledge: models may perform poorly in domains where domain- specific
knowledge is essential.
Introduction to Machine Learning
• Limitations:
2. Complexity:
3. Interpretability:
4. Generalization:
5. Scalability:
6. Human Expertise
History of Machine learning
• The history of machine learning traces back to the mid-20th century, with
roots in the fields of mathematics, computer science, and artificial intelligence.
Here's a brief overview:
1. Big Data: The explosion of data availability due to the internet, social media, and
digital technologies fueled the development of new machine learning algorithms
and techniques.
1. Experiences:
2. Observation:
3. Instruction:
and problem-solving.
What is human learning?
• Cognitive science is an interdisciplinary field that studies about
the mind and its processes, including how people think, learn,
• (2) we build our own notion indirectly based on what we have learnt from the
self-learning),
How do machines learn?
• The basic machine learning process can be divided into three parts.
based on knowledge input, students can do well in the examinations only till a
certain stage.
• Advantages:
• Can achieve high accuracy when trained on sufficient and representative data.
• Disadvantages:
• May suffer from overfitting if the model is too complex or the training dataset
is small.
Types of Machine Learning
• Unsupervised Learning:
• Disadvantages:
• The algorithm leverages the small amount of labeled data along with the
larger pool of unlabeled data to make predictions or learn patterns.
Types of Machine Learning
• Here are a few semi-supervised learning algorithms:
• Self-Training:
supervised settings.
2. It incorporates both labeled and unlabeled data into the SVM framework,
aiming to find a decision boundary that separates the data while minimizing
classification errors.
3. S3VM optimizes a combination of the margin and the empirical error on the
• Advantages:
1. Can learn complex behaviors and strategies through trial and error.
• Disadvantages:
11. Extreme Context Shifts: Machine learning models might not perform well
when deployed in situations drastically different from their training
environment. They lack adaptability extreme shifts in context.
12. New and Novel Situations: ML models typically operate based on patterns
learned from past data. When faced with entirely new and novel situations,
they might not have sufficient information to provide accurate predictions.
Problems Cannot to Be Solved Using Machine Learning
13. Interpersonal and Emotional Understanding: Recognizing and
responding to human emotions, nuances, and interpersonal
interactions are challenging tasks that require human emotional
intelligence and social understanding.
Applications of Machine Learning
• Machine learning (ML) has become a powerful tool across various industries,
transforming how we live and work.
Healthcare: Finance:
1. Disease diagnosis and prognosis. 1. Fraud detection and prevention
2. Personalized treatment 2. Credit scoring and risk assessment
recommendation. 3. Algorithmic trading and financial
3. Drug discovery and development forecasting
4. Medical imaging analysis (e.g., MRI, 4. Customer segmentation and targeted
CT scans) marketing
5. Electronic health record (EHR) 5. Portfolio optimization and wealth
analysis for patient management management
Applications of Machine Learning
E-commerce: Marketing:
1. Product recommendation and 1. Customer segmentation and targeting
personalized shopping experiences 2. Sentiment analysis and brand
2. Customer segmentation and churn sentiment monitoring
prediction 3. Social media analytics and influencer
3. Price optimization and dynamic identification
pricing strategies 4. Customer lifetime value prediction
4. Fraud detection and prevention 5. Campaign optimization and marketing
5. Supply chain optimization and attribution modeling
demand forecasting
Applications of Machine Learning
Manufacturing: Transportation:
1. Predictive maintenance for 1. Autonomous vehicles and self-driving
machinery and equipment cars
2. Quality control and defect detection 2. Route optimization and traffic prediction
3. Supply chain optimization and 3. Demand forecasting for ride-sharing
inventory management and delivery services
4. Demand forecasting and production 4. Fleet management and vehicle routing
planning 5. Predictive maintenance for
5. Process optimization and efficiency transportation infrastructure
improvement
Applications of Machine Learning
• Here are some of the most popular tools, along with their advantages,
disadvantages, and limitations:
Tools in Machine Learning
1. SCIKIT-LEARN:
Advantages:
• Simple and easy-to-use API, making it great for beginners.
• Disadvantages:
• May not be suitable for very large datasets or complex model architectures.
• Limitations:
• Disadvantages:
• Limitations:
Advantages:
• Dynamic computational graph makes it easier to debug and experiment.
• Disadvantages:
• Limitations:
• Advantages:
• Disadvantages:
• Limitations:
Advantages:
• Distributed computing capabilities suitable for big data processing.
• Disadvantages:
• Limitations:
Bias and fairness: Machine learning models can inherit biases present in the
training data, leading to unfair or discriminatory outcomes.
Privacy concerns: Handling sensitive data requires careful attention to privacy
regulations and ethical considerations, such as data anonymization and informed
consent.
Machine learning Activities
• The following are the machine learning
activities:
• Example: After training our spam email classifier, we evaluate its performance
using metrics like accuracy, precision, recall, and F1-score. We split our dataset
into training and testing sets to assess how well the model generalizes to
unseen or new data.
6. Hyper parameter Tuning:
• Example: We use techniques like grid search or random search to tune the
hyper parameters of our machine learning model. For instance, we adjust the
learning rate, regularization strength, and batch size of a neural network to
optimize its performance on a validation set.
Machine learning Activities
7. Cross-Validation:
• Advantages:
• Disadvantages:
• Disadvantages:
o High cardinality (many unique categories) can lead to issues like the
curse of dimensionality.
Basic Types of Data in Machine Learning
3. Ordinal Data:
• Examples: Education level (High School < Bachelor's < Master's < Ph.D.), Likert
scale ratings (Strongly Disagree < Disagree < Neutral < Agree < Strongly Agree).
• Advantages:
• Disadvantages:
o Not all machine learning algorithms can handle ordinal data directly.
• Advantages:
• Disadvantages:
• Preprocessing steps like tokenization and stemming are necessary, which can
introduce noise.
Basic Types of Data in Machine Learning
5. Image Data: Examples: Photographs, medical images, satellite images.
• Advantages:
o Rich visual information suitable for tasks like object detection, image
classification, and image segmentation.
o Deep learning models like CNNs can automatically extract hierarchical
features.
• Disadvantages:
• Examples: Stock prices over time, temperature readings, and sensor data.
• Advantages:
• Disadvantages:
• Advantages:
• Advantages:
4. Dimensionality Reduction:
• Disadvantages:
• Advantages:
• Disadvantages:
o Results may vary based on the choice of distance metric and clustering
algorithm. Example: Using K-means clustering to segment customers
based on their purchasing behavior.
Data Quality & Remediation
• Data quality refers to the reliability, accuracy, consistency, completeness, and
relevancy of data.
• Data remediation involves the process of identifying and correcting data quality
issues to ensure that data is accurate, reliable, and suitable for analysis or
decision-making.
• We can handle this by imputing missing values using techniques like mean,
median, or mode imputation, or by using advanced imputation methods like
K-nearest neighbors (KNN) or predictive models.
• Advantages:
o Imputed values may not accurately represent the true underlying data
distribution.
• Disadvantages:
• Disadvantages:
• Disadvantages:
5. Dimensionality Reduction:
• Disadvantages: