unit 6 ai
unit 6 ai
unit 6 ai
High Yes A
Low No B
Medium Yes A
High No B
Sunny Hot No
Actual Pos 50 10
Actual Neg 5 35
x1 x2 y
1 2 5
2 4 10
3 6 15
High Yes A
Low No B
Medium Yes A
High No B
Where:
• P(C∣X): Posterior probability of class C given predictors X.
• P(X∣C): Likelihood of predictors X given class C.
• P(C): Prior probability of class C.
• P(X): Evidence (constant across all classes).
In Naïve Bayes, the independence assumption simplifies P(X∣C) as a
product of individual probabilities of each feature:
P(X∣C)=P(X1∣C)×P(X2∣C)×⋯×P(Xn∣C)
Given Dataset
High Yes A
Low No B
Medium Yes A
High No B
Test Instance
Feature 1 = High, Feature 2 = Yes
We calculate the posterior probability for each class A and B, and classify
the test instance to the class with the highest posterior probability.
Step 4: Classification
Since P(A∣X)=0.25 and P(B∣X)=0.0 , the test instance is classified as Class
A.
Conclusion
Using Naïve Bayes classification, the test instance with Feature 1 = High
and Feature 2 = Yes belongs to Class A.
1. Working of SVM
Support Vector Machines (SVM) are supervised learning algorithms used
for both classification and regression tasks. The goal of SVM is to find the
optimal hyperplane that best separates data points belonging to different
classes in the feature space.
• Hyperplane: A hyperplane is a decision boundary that divides the
feature space into regions, each corresponding to a class label. In a
2D space, it is a line; in 3D, it becomes a plane, and in higher
dimensions, it generalizes to a hyperplane.
• Support Vectors: These are the data points closest to the
hyperplane. They have the largest influence on the position and
orientation of the hyperplane.
• Margin: The margin is the distance between the hyperplane and the
closest support vectors. SVM aims to maximize this margin, ensuring
better generalization.
2. Challenges with Linearly Non-Separable Data
When the data points are not linearly separable, SVM introduces:
• Slack Variables: These allow some points to be misclassified,
introducing flexibility in the model.
• Regularization Parameter : Controls the trade-off between
maximizing the margin and minimizing classification errors.
3. The Kernel Trick
The kernel trick is a mathematical technique used to transform data into a
higher-dimensional space where a hyperplane can separate it linearly.
Instead of explicitly mapping the data, kernels calculate the dot product of
data points in the transformed space, avoiding computational complexity.
Common Kernels:
• Linear Kernel: Used when data is linearly separable.
• Polynomial Kernel: Maps data to polynomial features.
• Radial Basis Function (RBF) Kernel (Gaussian Kernel): Handles
complex decision boundaries by mapping data to an infinite-
dimensional space.
• Sigmoid Kernel: Similar to a neural network activation function,
used in specific cases.
4. SVM for Classification
• SVM finds the hyperplane that best separates classes.
• For multi-class classification, strategies like One-vs-One or One-vs-
Rest are used.
• Example: Classifying emails as spam or not spam.
5. SVM for Regression (SVR)
• In regression tasks, SVM tries to find a hyperplane that predicts a
continuous output value.
• Instead of maximizing margin, SVR introduces an epsilon-tube
within which prediction errors are ignored.
• It minimizes the deviation of predicted values outside this tube.
6. Advantages of SVM
• Effective in high-dimensional spaces.
• Works well for both linearly separable and non-linearly separable
data.
• Robust against overfitting, especially with proper regularization.
7. Limitations of SVM
• Computationally intensive for large datasets.
• Choice of kernel and hyperparameters (C) significantly affects
performance.
8. Applications
• Classification: Face recognition, text categorization, and
bioinformatics.
• Regression: Predicting house prices, stock trends, or weather data.
Q4 Draw and explain a decision tree for the following data:
Sunny Hot No
Importance of Cross-Validation
1. Assessing Model Performance:
Cross-validation provides a reliable estimate of how well a model will
perform on unseen data by simulating its behavior on various
portions of the dataset.
2. Avoiding Overfitting:
By testing the model on data it has not seen during training, cross-
validation ensures that the model does not memorize the training
data, helping detect overfitting.
3. Optimizing Model Parameters:
It helps in hyperparameter tuning by identifying the best settings for a
model that maximize its predictive power.
4. Comparing Models:
Cross-validation provides a fair comparison of different models by
using the same validation strategy.
5. Effective Use of Data:
By utilizing the entire dataset for both training and validation at
different iterations, cross-validation ensures efficient use of limited
data.
k-Fold Cross-Validation
Definition:
In k-fold cross-validation, the dataset is randomly divided into k equal-
sized folds or subsets. The process is as follows:
1. Use k−1 folds for training the model.
2. Use the remaining 1 fold for testing (validation).
3. Repeat the process k times, with each fold being used exactly once
as the validation set.
4. Calculate the average performance metric (e.g., accuracy, precision)
across all k iterations to evaluate the model.
Steps in k-Fold Cross-Validation:
• Divide the dataset into k subsets.
• Train the model on k−1 folds and test on the remaining fold.
• Repeat for each fold.
• Aggregate the performance results.
Formula for Evaluating Performance:
Q6 Explain precision, recall, F1-score, accuracy, and the area under the
curve (AUC). Given the following confusion matrix, calculate all these
metrics:
Actual Pos 50 10
Actual Neg 5 35
Answer - 1. Precision
Precision measures the proportion of correctly predicted positive
observations out of all predicted positives.
•
Indicates: How many of the positive predictions were actually
correct.
•
Indicates: How many of the actual positives were identified
correctly.
3. F1-Score
The F1-score is the harmonic mean of precision and recall, providing a
single metric to evaluate the balance between them.
4. Accuracy
Accuracy measures the proportion of correctly predicted observations out
of all observations.
•
Indicates: The overall correctness of the model.
Confusion Matrix
Calculations
1. Precision
2. Recall
3. F1-Score
4. Accuracy
5. AUC
To calculate the AUC:
AUC is typically derived from the ROC curve using these values. For this
confusion matrix, a high AUC is expected due to strong performance in TPR
and low FPR.
Summary of Metrics
Metric Value
Precision 90.9%
Recall 83.3%
F1-Score 86.9%
Accuracy 85%
x1 x2 y
1 2 5
2 4 10
3 6 15
Data:
1. Set Up the Regression Equation
Solving, we find:
b0=0, b1=1, b2=2
6. Final Regression Equation
Conclusion
The regression equation y=x1+2x2 effectively models the given data using
the least squares method.
x1 x2 y
1 2 5
2 4 10
3 6 15
Conclusion:
The least squares method provides an optimized model to predict y based
on x1 and x2. The derived equation can be used for forecasting or analyzing
relationships between variables.
where:
• m is the number of data points,
• hθ(x(i)) is the hypothesis function hθ(x)=θ0+θ1x(i) for linear regression,
• λ is the regularization parameter, and
• ∣θ∣ is the LASSO regularization term penalizing the absolute values of
θ.
Steps to Solve
1. Hypothesis Function
The hypothesis for linear regression is:
2. LASSO Regularization
LASSO adds the term λ∑∣θ∣ to the cost function. This term induces sparsity
by shrinking some θj values to zero.
3. Derivative and Update Rule
To minimize J(θ), take partial derivatives with respect to θ0 and θ1:
x y
1 2
2 4
3 6
Result
After applying LASSO regularization, λ=0.1 reduces the magnitude of θ1,
potentially setting it to zero for irrelevant features. This leads to a sparse
solution where only significant predictors remain in the model. For the
given data, the final values of θ0 and θ1 would be computed iteratively.