01 Intro To ML Wo Videos
01 Intro To ML Wo Videos
01 Intro To ML Wo Videos
Machine
Learning
Olivia Pfeiler
[email protected]
2
In this course you will learn …
3
How will you learn?
4
Overview of classes
Date Start End topic location lecturer
04.10.202 History of ML/AI & basic ML EDV1 (Villach) / Pfeiler
4 4:50 pm 8:10 pm VCR
terms
11.10.202 Blended learning (algebra theory); complete data camp class – deadline Pfeiler / self-
4 Nov 1, 2024 study
18.10.202 Blended learning (data pre-processing); complete data camp class – Pfeiler / self-
4 deadline Nov 7, 2024 study
08.11.202 Recap data preprocessing; EDV1 (Villach) / Pfeiler
4:50 pm 8:10 pm
4 learning VCR
15.11.202 Decision Trees & decision EDV1 (Villach) / Pfeiler
4:50 pm 8:10 pm
4 metrics VCR
22.11.202 Kmeans & resampling VCR Pfeiler
4:50 pm 8:10 pm
4 methods
06.12.202 Feature engineering & feature EDV4 (Villach) / Pfeiler
4:50 pm 8:10 pm
4 selection VCR
11.12.202 ML lab 1 (exercises & EDV2 (Villach) / Plankensteine
5:40 pm 9:00 pm
4 homework) VCR r
13.12.202 Regression (linear & non- EDV1 (Villach) / Pfeiler
4 4:50 pm 8:10 pm linear) & information about VCR
student projects 5
How to pass the class?
• Attendance in the class room or online is not part of the final grade
• Homework
• Linear Algebra exercises (self-study + data camp class, deadline Nov. 1st, 2024) 5%
of final grade
• Data pre-processing (self-study + data camp class, deadline Nov. 7th, 2024) 5% of
final grade
• ML exercise 1 (with K. Plankensteiner, submission, Jan. 24) 5% of final grade
• ML exercise 2 (with K. Plankensteiner, submission Jan. 24) 5% of final grade
• Final exam (date after Jan. 10, 2025, please align on possible dates) 60% of final grade
6
Bibliography
8
Plan for today
Topics Procedure
• AI & ML – why there is a hype? • Two 90 min sessions with
• AI & ML – A quick tour through the one 20 min break
history
• Machine Learning – general
explanation
• The Machine Learning Life Cycle
9
AI & Machine Learning hype
10
Decision Neural Supervised Python
Tree Network Learning
Support
Data Vector K-means GPU Deep
Learning
Machines
Science
Buzzword Feature Data Pre- Big
Bingo Algorithm
Engineering processing Data
11
AI & Machine Learning in the media
12
Source: https://www.quytech.com/blog/wp-content/uploads/2021/05/AI-spending.png 13
Why is AI/ML so popular right now?
• ML matured a lot in the last decade & changed a lot in the last years
• e.g. statistical and probabilistic underpinning of methods
• tools to implement ML methods have also matured +
• Abundant data
• the amount of data collected and stored is growing rapidly “information overload”
• diverse data sources (broad applicability): email, social networks, RSS, podcasts,
websites, …
• Abundant computation
• computation power is easily accessible and cheap
• enabling “abundant data” & powerful machine learning algorithms
14
Will the hype continue?
15
AI winters & AI hypes
Source:
https://hackernoon.com
approx. 2007
“Data explosion” / “information 16
overload”
Hype Cycle
https://www.gartner.com/en/documents/3887767
12 Okt 2023 restricted Copyright © Infineon Technologies AG 2023. All rights reserved.
Gartner Hype Cycle for AI – Status 2023
18
AI & ML – A quick tour through the
history
19
AI & Machine Learning history
20
Source: https://atos.net
<1960 “I repeat”
Alan Turing published a paper 1950 Arthur Samuel defines the term
entitled “Computing Machinery and Machine learning:
Intelligence”
“ML is the field of study that gives
Turing Test or “Imitation Game”: computers the ability to learn
a simple test that could be used to 1959 without being explicitly
prove that machines could think
programmed”
Source: www.wikipedia.org
Source: www.wikipedia.org 21
1960 – 2010 “I imitate”
I repeat
22
1960 – 2010 “I imitate”
I repeat
Source: www.britannica.com 23
2010 – 2018 “I learn”
I repeat
Source: www.infoworld.com
24
2010 – 2018 “I learn”
I repeat
25
2010 – 2018 “I learn”
I repeat
26
2018 - 2022 “I learn to learn”
I repeat
27
> 2022 “I contribute”
Source: https://www.openai.com/
Machine Learning – general
explanation
29
AI, Machine Learning & Deep Learning
Mathemat
AI ML DL Data ics
Mimic the Computers ML inspired by
intelligence and learn from data our brain’s Scien Statistics
behavioral pattern without network of
of humans complex set of neurons ce Visualizati
rules
on
EDA
30
What is Machine Learning?
• ML explores the study and construction of algorithms that can learn
from and make predictions on data (allow computers to learn)
• Basic principals of ML have a strong relation to
• Statistical and mathematical theory
• Numerical optimization
• Learning algorithms
• Two main goals
• Make predictions
• Understand systems better
31
Data Analytics vs. Machine Learning
33
Elements of ML
• Different types of learning: • Models: Theoretical assumptions
supervised vs. unsupervised, active explaining relationships in the data
vs. passive, … • Algorithms: Routines to get model
Models & parameters & make predictions
Learning
Algorithms
34
Some advantages of ML
• Decision rules learnt from data
new relations may become visible
• Objective and repeatable decision making
no subjective reasoning
• ML models are flexible, because e.g.
predictors can be combined (non)-linearly
• ML models will improve over time (with better
data quality, lager data sets, more computing
power, …)
• Basic ML methods are implemented and easy
to use in common SW (MATLAB, Python, R, …)
35
Some things to consider when using ML
• ML can only learn from information given
by the data high quality and reliable
training data are needed
• 80% (data preparation) vs. 20% (data
evaluation) rule is true, BUT just for the
Proof of Concept (PoC). Even more
resources are needed for deployment
• Large/complex ML models may over fit
loss in prediction power / generalization
• Selection of ML method is “subjective”
• Basic ML models have often a “linear
nature”
• Complex models need extensive
computing power for the training 36
The Machine Learning Life Cycle
37
<<
<<<
<<<
<< 2)
<<<
Understand
<<
1) Understand
Business &
Data
3) Gather
& Prepare
The Machine
Process Data
Learning Life
Cycle
6) Deploy & 4) Evaluate
maintain & Test
Models 5) Select & Models
Optimize topics addressed
Model in this class
CRISP-DM: CRoss-Industry Standard Process for
Data Mining
38
CRISP DM –
alternative
presentation
Source: https://en.wikipedia.org/wiki/Cross-industry_standard_process_f
or_data_mining
39
Sourc
e 40
ML life cycle – Roles and Responsibilities
Domain Expert Data Scientist In the PAST …
<< • Data Scientist supported by domain expert
<<<
where responsible for the whole ML life
<<<
<< 2) cycle
<<<
Understand
Data
<<
1) Understand 3) Gather
BUT ML projects are …
Business & & Prepare
Process Data • complex, similar to SW project
• need diverse competencies to be successful
• Proof of Concept (PoC) is just the beginning
41
ML life cycle – Roles and Responsibilities
Domain Expert Data Engineer TODAY…
<< • successful DS teams consist of people
<<<
<<< with different expertise working
<< 2) together
<<<
Understand
Data
<< • Domain Expert: Understands the
1) Understand 3) Gather
Business & & Prepare problem, the business needs and the
Process Data data
• Data Engineer: Develops and
manages the pipeline for raw data
collection and pre-processing to feed
6) Deploy & 4) Evaluate ML models
maintain & Test
Models 5) Select & Models • Data Scientist: Analyses data,
ML Engineer Optimize Data Scientist
develops & evaluates ML/DL models
Model
• ML Engineer: Develops pipelines to
CRISP-DM: CRoss-Industry Standard Process for Data Mining
deploy and maintain ML models
42
Managing the ML life cycle – MLOps
<<
<<<
• Compound of “Machine Learning”
<<<
and “operations” << 2)
<<<
Understand
• Practice to manage the whole ML Data
<<
1) Understand 3) Gather
lifecycle in order to optimize both Business & & Prepare
the governance and the scalability Process Data
43
Overview of classes
Date Start End topic location lecturer
04.10.202 History of ML/AI & basic ML EDV1 (Villach) / Pfeiler
4 4:50 pm 8:10 pm VCR
terms
11.10.202 Blended learning (algebra theory); complete data camp class – deadline Pfeiler / self-
4 Nov 1, 2024 study
18.10.202 Blended learning (data pre-processing); complete data camp class – Pfeiler / self-
4 deadline Nov 7, 2024 study
08.11.202 Recap data preprocessing; EDV1 (Villach) / Pfeiler
4:50 pm 8:10 pm
4 learning VCR
15.11.202 Decision Trees & decision EDV1 (Villach) / Pfeiler
4:50 pm 8:10 pm
4 metrics VCR
22.11.202 Kmeans & resampling VCR Pfeiler
4:50 pm 8:10 pm
4 methods
06.12.202 Feature engineering & feature EDV4 (Villach) / Pfeiler
4:50 pm 8:10 pm
4 selection VCR
11.12.202 ML lab 1 (exercises & EDV2 (Villach) / Plankensteine
5:40 pm 9:00 pm
4 homework) VCR r
13.12.202 Regression (linear & non- EDV1 (Villach) / Pfeiler
4 4:50 pm 8:10 pm linear) & information about VCR
student projects 44
Blended learning assignments
1. Recap on algebra theory
• Slides & link to online tutorials available in moodle
• ToDo: complete data camp class – deadline Nov 1, 2024
• Knowing the content is essential to understand how ML algorithms work
Questions?
45
Olivia Pfeiler
[email protected]