Dimensionality Reduction

Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible.

This can be done for a variety of reasons, such as:
 to reduce the complexity of a model,
 to improve the performance of a learning algorithm, or
 to make it easier to visualize the data.
There are several techniques for dimensionality reduction, including
 principal component analysis (PCA),
 singular value decomposition (SVD), and
 linear discriminant analysis (LDA).
Each technique uses a different method to project the data onto a lower-dimensional space
while preserving important information.
In other words, it is a process of transforming high-dimensional data into a lower-
dimensional space that still preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of features or
variables. The curse of dimensionality is a common problem in machine learning, where the
performance of the model deteriorates as the number of features increases. This is because
the complexity of the model increases with the number of features, and it becomes more
difficult to find a good solution.
In addition, high-dimensional data can also lead to overfitting, where the model fits the
training data too closely and does not generalize well to new data.
Dimensionality reduction can help to mitigate these problems by reducing the complexity
of the model and improving its generalization performance.
There are two main approaches to dimensionality reduction:
feature selection and feature extraction.
Feature selection involves selecting a subset of the original features that are most relevant
to the problem at hand. The goal is to reduce the dimensionality of the dataset while
retaining the most important features.
There are several methods for feature selection, including
 filter methods,
 wrapper methods, and
 embedded methods.
Filter methods rank the features based on their relevance to the target variable,
wrapper methods use the model performance as the criteria for selecting features,
embedded methods combine feature selection with the model training process.
Feature extraction involves creating new features by combining or transforming the

original features. The goal is to create a set of features that captures the essence of the
original data in a lower-dimensional space.
There are several methods for feature extraction, including
principal component analysis (PCA),
linear discriminant analysis (LDA), and
t-distributed stochastic neighbor embedding (t-SNE).
PCA is a popular technique that projects the original features onto a lower-dimensional
space while preserving as much of the variance as possible.
Why is Dimensionality Reduction important in Machine Learning and Predictive
Modelling?
An intuitive example of dimensionality reduction can be discussed through a simple e-mail

classification problem, where we need to classify whether the e-mail is spam or not. This
can involve a large number of features, such as whether or not the e-mail has a generic title,
the content of the e-mail, whether the e-mail uses a template, etc. However, some of these
features may overlap.
A 3-D classification problem can be hard to visualize, whereas a 2-D one can be mapped to
a simple 2-dimensional space, and a 1-D problem to a simple line. The below figure
illustrates this concept, where a 3-D feature space is split into two 2-D feature spaces, and
later, if found to be correlated, the number of features can be reduced even further.
Components of Dimensionality Reduction

There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of variables, or
features, to get a smaller subset which can be used to model the problem. It usually
involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the method
used. The prime linear method, called Principal Component Analysis, or PCA, is discussed
below.
Principal Component Analysis
This method was introduced by Karl Pearson. It works on the condition that while the data
in a higher dimensional space is mapped to data in a lower dimension space, the variance of
the data in the lower dimensional space should be maximum.
It involves the following steps:

 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large
fraction of variance of the original data.

Hence, we are left with a lesser number of eigenvectors, and there might have been some
data loss in the process. But, the most important variances should be retained by the
remaining eigenvectors.
Advantages of Dimensionality Reduction
 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
 Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D or 3D, which
can help in better understanding and analysis.
 Overfitting Prevention: High dimensional data may lead to overfitting in machine
learning models, which can lead to poor generalization performance. Dimensionality
reduction can help in reducing the complexity of the data, and hence prevent overfitting.
 Feature Extraction: Dimensionality reduction can help in extracting important features
from high dimensional data, which can be useful in feature selection for machine
learning models.
 Data Preprocessing: Dimensionality reduction can be used as a preprocessing step
before applying machine learning algorithms to reduce the dimensionality of the data
and hence improve the performance of the model.
 Improved Performance: Dimensionality reduction can help in improving the
performance of machine learning models by reducing the complexity of the data, and
hence reducing the noise and irrelevant information in the data.
Disadvantages of Dimensionality Reduction
 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is sometimes
undesirable.
 PCA fails in cases where mean and covariance are not enough to define datasets.
 We may not know how many principal components to keep- in practice, some thumb
rules are applied.
 Interpretability: The reduced dimensions may not be easily interpretable, and it may be
difficult to understand the relationship between the original features and the reduced
dimensions.
 Overfitting: In some cases, dimensionality reduction may lead to overfitting, especially
when the number of components is chosen based on the training data.
 Sensitivity to outliers: Some dimensionality reduction techniques are sensitive to
outliers, which can result in a biased representation of the data.
 Computational complexity: Some dimensionality reduction techniques, such as
manifold learning, can be computationally intensive, especially when dealing with large
datasets.
Singular Value Decomposition is one of the important concepts in linear algebra.

To understand the meaning of singular value decomposition (SVD), one must be aware of
the related concepts such as matrix, types of matrices, transformations of a matrix, etc. As
this concept is connected to various concepts of linear algebra, it’s become challenging to
learn the singular value decomposition of a matrix. In this article, you will learn the definition
of singular value decomposition, examples of 2×2 and 3×3 matrix decomposition in detail.
What is Singular Value Decomposition?
The Singular Value Decomposition of a matrix is a factorization of the matrix into three
matrices. Thus, the singular value decomposition of matrix A can be expressed in terms of
the factorization of A into the product of three matrices as A = UDVT
Here, the columns of U and V are orthonormal, and the matrix D is diagonal with real
positive entries.
Singular Value Decomposition of a Matrix
Mathematically, the singular value decomposition of a matrix can be explained as follows:
Consider a matrix A of order mxn.
This can be uniquely decomposed as:
A = UDVT
U is mxn and column orthogonal (that means its columns are eigenvectors of AAT)
(AAT = UDVT VDUT = UD2UT )
V is nxn and orthogonal (that means its columns are eigenvectors of ATA)
(ATA = VDUT UDVT = VD2V T )
D is nxn diagonal, where non-negative real values are called singular values.
Learn how to find eigenvalues and eigenvectors of a matrix here.
Let D = diag(σ1, σ2,…, σn) ordered such that σ1 ≥ σ2 ≥ … ≥ σn.
If σ is a singular value of A, its square is an eigenvalue of ATA.
Also, let U = (u1 u2 … un) and V = (v1 v2 … vn).
Therefore,
𝐴=∑𝑖=1𝑛𝜎𝑖𝑢𝑖𝑣𝑖𝑇
Here, the sum can be given from 1 to r so that r is the rank of matrix A.
Go through the example given below to understand the process of singular value
decomposition of a matrix in a better way.
Singular Value Decomposition Applications
Some of the applications of singular value decomposition are listed below:
SVD has some fascinating algebraic characteristics and conveys relevant geometrical and
theoretical insights regarding linear transformations.
SVD has some critical applications in data science too.
Mathematical applications of the SVD involve calculating the matrix approximation, rank of
a matrix and so on.
The SVD is also greatly useful in science and engineering.
It has some applications of statistics, for example, least-squares fitting of data and process
control.
Principal Component Analysis(PCA).
As the number of dimensions’ increases, the number of possible combinations of features
increases exponentially, which makes it computationally difficult to obtain a representative
sample of the data and it becomes expensive to perform tasks such as clustering or
classification.
Additionally, some machine learning algorithms can be sensitive to the number of
dimensions, requiring more data to achieve the same level of accuracy as lower-dimensional
data.
To address the curse of dimensionality, Feature engineering techniques are used which
include feature selection and feature extraction.
Dimensionality reduction is a type of feature extraction technique that aims to reduce the
number of input features while retaining as much of the original information as possible.
What is Principal Component Analysis(PCA)?
Principal Component Analysis(PCA) technique was introduced by the mathematician Karl
Pearson in 1901. It works on the condition that while the data in a higher dimensional space
is mapped to data in a lower dimension space, the variance of the data in the lower
dimensional space should be maximum.
Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal
transformation that converts a set of correlated variables to a set of uncorrelated variables.
PCA is the most widely used tool in exploratory data analysis and in machine learning for
predictive models.
Principal Component Analysis (PCA) is an unsupervised learning algorithm technique used
to examine the interrelations among a set of variables. It is also known as a general factor
analysis where regression determines a line of best fit.
The main goal of Principal Component Analysis (PCA) is to reduce the dimensionality of a
dataset while preserving the most important patterns or relationships between the variables
without any prior knowledge of the target variables.
Principal Component Analysis (PCA) is used to reduce the dimensionality of a data set by
finding a new set of variables, smaller than the original set of variables, retaining most of the
sample’s information, and useful for the regression and classification of data.
Principal Component Analysis (PCA) is a technique for dimensionality reduction that

identifies a set of orthogonal axes, called principal components, that capture the maximum
variance in the data.
The principal components are linear combinations of the original variables in the dataset and
are ordered in decreasing order of importance.
The total variance captured by all the principal components is equal to the total variance in
the original dataset.
The first principal component captures the most variation in the data, but the second principal
component captures the maximum variance that is orthogonal to the first principal
component, and so on.
Principal Component Analysis can be used for a variety of purposes, including data
visualization, feature selection, and data compression.
In data visualization, PCA can be used to plot high-dimensional data in two or three
dimensions, making it easier to interpret.
In feature selection, PCA can be used to identify the most important variables in a dataset. In
data compression, PCA can be used to reduce the size of a dataset without losing important
information.
In Principal Component Analysis, it is assumed that the information is carried in the variance
of the features, that is, the higher the variation in a feature, the more information that features
carries.
Overall, PCA is a powerful tool for data analysis and can help to simplify complex datasets,
making them easier to understand and work with.
Principal Component Analysis
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning.
It is a statistical process that converts the observations of correlated features into a set of
linearly uncorrelated features with the help of orthogonal transformation. These new
transformed features are called the Principal Components.
It is one of the popular tools that is used for exploratory data analysis and predictive
modeling. It is a technique to draw strong patterns from the given dataset by reducing the
variances.
PCA generally tries to find the lower-dimensional surface to project the high-dimensional
data.
PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie recommendation system,
optimizing the power allocation in various communication channels. It is a feature extraction
technique, so it contains the important variables and drops the least important variable.
The PCA algorithm is based on some mathematical concepts such as:
 Variance and Covariance
 Eigenvalues and Eigen factors
Some common terms used in PCA algorithm:
Dimensionality: It is the number of features or variables present in the given dataset. More
easily, it is the number of columns present in the dataset.
Correlation: It signifies that how strongly two variables are related to each other. Such as if
one changes, the other variable also gets changed. The correlation value ranges from -1 to +1.
Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that
variables are directly proportional to each other.
Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.
Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be
eigenvector if Av is the scalar multiple of v.
Covariance Matrix: A matrix containing the covariance between the pair of variables is
called the Covariance Matrix.
Principal Components in PCA
As described above, the transformed new features or the output of PCA are the Principal
Components. The number of these PCs are either equal to or less than the original features
present in the dataset.
Some properties of these principal components are given below:
 The principal component must be the linear combination of the original features.
 These components are orthogonal, i.e., the correlation between a pair of variables is
zero.
 The importance of each component decreases when going to 1 to n, it means the 1 PC
has the most importance, and n PC will have the least importance.
Steps for PCA algorithm
Getting the dataset
Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is
the training set, and Y is the validation set.
Representing data into a structure
Now we will represent our dataset into a structure. Such as we will represent the two-
dimensional matrix of independent variable X. Here each row corresponds to the data items,
and the column corresponds to the Features. The number of columns is the dimensions of the
dataset.
Standardizing the data
In this step, we will standardize our dataset. Such as in a particular column, the features with
high variance are more important compared to the features with lower variance.
If the importance of features is independent of the variance of the feature, then we will divide
each data item in a column with the standard deviation of the column. Here we will name the
matrix as Z.
Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After
transpose, we will multiply it by Z. The output matrix will be the Covariance matrix of Z.
Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance
matrix Z. Eigenvectors or the covariance matrix are the directions of the axes with high
information. And the coefficients of these eigenvectors are defined as the eigenvalues.
Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing order, which
means from largest to smallest. And simultaneously sort the eigenvectors accordingly in
matrix P of eigenvalues. The resultant matrix will be named as P*.
Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z.
In the resultant matrix Z*, each observation is the linear combination of original features.
Each column of the Z* matrix is independent of each other.
Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep and what to remove. It
means, we will only keep the relevant or important features in the new dataset, and
unimportant features will be removed out.
Applications of Principal Component Analysis
 PCA is mainly used as the dimensionality reduction technique in various AI
applications such as computer vision, image compression, etc.
 It can also be used for finding hidden patterns if data has high dimensions. Some
fields where PCA is used are Finance, data mining, Psychology, etc.

Dimensionality Reduction

Uploaded by

Copyright:

Available Formats

Dimensionality Reduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dimensionality Reduction

Uploaded by

Copyright:

Available Formats

Dimensionality reduction is the process of reducing the number of features (or

dimensions) in a dataset while retaining as much information as possible.

Feature extraction involves creating new features by combining or transforming the

An intuitive example of dimensionality reduction can be discussed through a simple e-mail

Components of Dimensionality Reduction

It involves the following steps:

Singular Value Decomposition is one of the important concepts in linear algebra.

Principal Component Analysis (PCA) is a technique for dimensionality reduction that

You might also like