CSE 446 Machine Learning: Instructor: Pedro Domingos
CSE 446 Machine Learning: Instructor: Pedro Domingos
CSE 446 Machine Learning: Instructor: Pedro Domingos
Machine Learning
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
Magic?
No, more like gardening
• Seeds = Algorithms
• Nutrients = Data
• Gardener = You
• Plants = Programs
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging
• [Your favorite area]
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
• Hundreds new every year
• Every machine learning algorithm has
three components:
– Representation
– Evaluation
– Optimization
Representation
• Decision trees
• Sets of rules / Logic programs
• Instances
• Graphical models (Bayes/Markov nets)
• Neural networks
• Support vector machines
• Model ensembles
• Etc.
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• Etc.
Optimization
• Combinatorial optimization
– E.g.: Greedy search
• Convex optimization
– E.g.: Gradient descent
• Constrained optimization
– E.g.: Linear programming
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
• Unsupervised learning
– Training data does not include desired outputs
• Semi-supervised learning
– Training data includes a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Inductive Learning
• Given examples of a function (X, F(X))
• Predict function F(X) for new examples X
– Discrete F(X): Classification
– Continuous F(X): Regression
– F(X) = Probability(X): Probability estimation
What We’ll Cover
• Supervised learning
– Decision tree induction
– Rule induction
– Instance-based learning
– Bayesian learning
– Neural networks
– Support vector machines
– Model ensembles
– Learning theory
• Unsupervised learning
– Clustering
– Dimensionality reduction
ML in Practice
• Understanding domain, prior knowledge,
and goals
• Data integration, selection, cleaning,
pre-processing, etc.
• Learning models
• Interpreting results
• Consolidating and deploying discovered
knowledge
• Loop