Unit 2 Da
Unit 2 Da
Unit 2 Da
Unit-2 DA
UNIT-2
# Regression modelling
A regression model provides a function that describes the
relationship between one or more independent variables and
a response, dependent, or target variable.
For example, the relationship between height and weight may be
described by a linear regression model. A regression analysis is the basis
for many types of prediction and for determining the effects on target
variables. When you hear about studies on the news that talk about fuel
efficiency, or the cause of pollution, or the effects of screen time on
learning, there is often a regression model being used to support their
claims.
Types of Regression
1. Linear
A linear regression is a model where the relationship between inputs and outputs is a
straight line. This is the easiest to conceptualize and even observe in the real world.
Even when a relationship isn’t very linear, our brains try to see the pattern and attach
a rudimentary linear model to that relationship.
Using a linear regression model, we want to estimate the correlation between the
number of emails sent and response rates. In other words, if the linear model fits our
observations well enough, then we can estimate that the more emails we send, the
more responses we will get.
When making a claim like this, whether it is related to exercise, happiness, health, or
any number of claims, there is usually a regression model behind the scenes to
support the claim.
In addition, the model fit can be described using a mean squared error. This
basically gives us a number to show exactly how well the linear model fits.
2. Multiple
Multiple regression indicates that there are more than one input variables that may
affect the outcome, or target variable. For our email campaign example, you may
include an additional variable with the number of emails sent in the last month.
By looking at both input variables, a clearer picture starts to emerge about what
drives users to respond to a campaign and how to optimize email timing and
frequency. While conceptualizing the model becomes more complex with more
inputs, the relationship may continue to be linear.
For these models, it is important to understand exactly what effect each input has
and how they combine to produce the final target variable results.
3. Non-Linear
The goal of the model is to make the sum of the squares as small as
possible. The sum of squares is a measure that tracks how far the Y
observations vary from the nonlinear (curved) function that is used to
predict Y.
KEY
While the other items we have talked about until now are specific types of models,
stepwise regression is more of a technique. If a model involves many potential
inputs, the analyst may start with the most directly correlated input variable to build a
model. Once that is accomplished, the next step is to make the model more
accurate.
To do that, additional input variables can be added to the model one at a time in
order of significance of the results. Using our email marketing example, the initial
model may be based on just the number of emails sent. Then we would add
something like the average age of the email recipient. After that, we would add the
average number of emails each recipient has received from us. Each additional
variable would add a small amount of additional accuracy to the model.
Multivariate analysis
Multivariate means involving multiple dependent variables resulting in
one outcome. This explains that the majority of the problems in the real
world are Multivariate. For example, we cannot predict the weather of
any year based on the season. There are multiple factors like pollution,
humidity, precipitation, etc.
Advantages
Disadvantages
Interdependence Technique
Interdependence techniques are a type of relationship that variables
cannot be classified as either dependent or independent.
# Bayesian Modeling
Bayesian networks are a type of probabilistic graphical model
that uses Bayesian inference for probability computations. Bayesian
networks aim to model conditional dependence, and therefore
causation, by representing conditional dependence by edges in a
directed graph. Through these relationships, one can efficiently
conduct inference on the random variables in the graph through the
use of factors.
Inference
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data
point in the correct category in the future. This best decision boundary is called a
hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different
categories that are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model
can be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support
vectors), it will see the extreme case of cat and dog. On the basis of the support vectors,
it will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be a
straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point
of the lines from both the classes. These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin. And the goal of SVM is
to maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
different types of algorithms that are being used for pattern analysis. They
Methods are employed in SVM (Support Vector Machines) which are used
Here we can see a hyperplane which is separating green dots from the
blue ones. A hyperplane is one dimension less than the ambient plane.
ambient space but the lone which divides or classifies the space is one
is no good linear line that should be able to classify the red and the green
dots as the points are randomly distributed. Here comes the use of kernel
function which takes the points to higher dimensions, solves the problem
over there and returns the output. Think of this in this way, we can see
that the green dots are enclosed in some perimeter area while the red one
lies outside it, likewise, there could be other scenarios where green dots
three-dimensional area and here our classifier i.e. hyperplane will not be a
straight line but a two-dimensional plane which will cut the area.
so,
f(3 ,4, 5)=(9, 12, 15, 12, 16, 20, 15, 20, 25)
1444
And,
This as we find out, f(x).f(y) and K(x, y) give us the same result, but the
dimensions into 9 dimensions) while using the kernel, it was much easier.
Let us see some of the kernel function or the types that are being used in
SVM:
1. Liner Kernel
Let us say that we have two vectors with name x1 and Y1, then the linear
K(x1, x2) = x1 . x2
2. Polynomial Kernel
A polynomial kernel is defined by the following equation:
Where,
3. Gaussian Kernel
This kernel is an example of a radial basis function kernel. Below is the
The given sigma plays a very important role in the performance of the
4. Exponential Kernel
This is in close relation with the previous kernel i.e. the Gaussian kernel
5. Laplacian Kernel
This type of kernel is less prone for changes and is totally equal to
activation function for the sigmoid kernel is the bipolar sigmoid function.
This kernel is very much used and popular among support vector
machines.
problems just like the Gaussian and Laplacian kernels. This also comes
There are a lot more types of Kernel Method and we have discussed the
mostly used kernels. It purely depends on the type of problem which will
What sets time series data apart from other data is that the analysis can
show how variables change over time. In other words, time is a crucial
variable because it shows how the data adjusts over the course of the data
points as well as the final results. It provides an additional source of
information and a set order of dependencies between the data.
Or
A time series is nothing but a sequence of various data points that occurred in a
successive order for a given period of time
Objectives:
• Forecasting
• Segmentation
• Classification
• Descriptive analysis`
• Intervention analysis
• Similar to other models, the missing values are not supported by TSA
• The data points must be linear in their relationship.
• Data transformations are mandatory, so a little expensive.
• Models mostly work on Uni-variate data.
• Weather data
• Rainfall measurements
• Temperature readings
• Quarterly sales
• Stock prices
• Industry forecasts
• Interest rates
Rule Induction
Rule induction is a data mining process of deducing if-then rules from a data set. These
symbolic decision rules explain an inherent relationship between the attributes and class
labels in the data set. Many real-life experiences are based on intuitive rule induction. For
example, we can proclaim a rule that states “if it is 8 a.m. on a weekday, then highway traffic
will be heavy” and “if it is 8 p.m. on a Sunday, then the traffic will be light.” These rules are
not necessarily right all the time. 8 a.m. weekday traffic may be light during a holiday
season. But, in general, these rules hold true and are deduced from real-life experience based
on our every day observations. Rule induction provides a powerful classification approach
hat can be easily understood by the general users. It is used in predictive analytics by classification of
unknown data. Rule induction is also used to describe the patterns in the data. The easiest way to
extract rules from a data set is from a decision tree that is developed on the same data set.