Report Kernel Pca Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

ARTIFICIAL INTELLIGENCE

MEE3033

KERNEL PRINCIPAL COMPONENT ANALYSIS (PCA) METHOD

GROUP 3

LECTURER'S NAME:

PROFESOR MADYA DR. AOS A. Z. ANSAEF AL-JUBOORI

STUDENT’S NAME MATRIC NUMBER


MUHAMMAD NUR FITRI BIN NORAFFENDI D20191089401

UN JUN JIE D20181081696

ADZEEM AQMALHAKIM BIN ZABRI D20191088404

NUR ADILA MIRHAH BINTI ISMAIL D20191089484

NUR SYAZWANI BINTI WAHI ANUAR D20191089475

KIASATINA HASYIMAH BINTI NASIL D20191089476


TABLE OF CONTENT

NO TOPIC PAGE

1 INTRODUCTION 1

2 SIGNIFICANT OF KERNEL 2

3 MECHANIZATION SEQUENCE 3

4 WHO CAN APPLY KERNEL PCA 4

5 WEAKNESS 5

6 STRONG POINT 6

7 LIST OF APPLICATION 7

8 REFERENCE 9
1.0 INTRODUCTION

Kernel Principal Component Analysis (KPCA) is a dimensionality reduction method that


is non-linear. It uses kernel approaches to extend Principal Component Analysis (PCA), which is
a linear dimensionality reduction methodology. PCA (Principal Component Analysis) is a
statistical method for reducing linear dimensionality. Kernel-PCA is a well-known non-linear
extension of the traditional dimensionality reduction technique. PCA is an unsupervised
dimension reduction technique that relies on the orthogonal transformation of a
higher-dimensional data space into a lower-dimensional subspace. A statistical method first
presented by Karl Pearson to find lines or planes of greatest fit in the context of regression. It has
since been quoted and duplicated in a variety of situations, most notably in machine learning for
dimension reduction and denoising in picture reconstruction. Because part of the features in the
original space may not be necessary for projecting the data in the reduced subspace, this implies
that it is employed for feature extraction.

1
2.0 SIGNIFICANT OF KERNEL

The significance of the Kernel PCA approach is that, unlike other frequent analytic
techniques, it intrinsically considers combinations of predictive characteristics while optimising
dimensionality reduction, as do other kernel methods. Of course, large accuracy advances may
typically be realised by generalising over relevant feature combinations for natural language
issues in general (e.g., Kudo and Matsumoto) (2003). Another advantage of KPCA for the WSD
task is that the dimensionality of the input data is generally very large, a condition where kernel
methods excel. Nonlinear principal components (Diamantaras and Kung, 1996) may be defined
as follows. For example, suppose we are given a training set of M pairs (xt, ct) where the
observed vectors xt ∈ Rn in an n-dimensional input space X represent the context of the target
word being disambiguated, and the correct class ct represents the sense of the word, for t = 1, ..,
M. Suppose Φ is a nonlinear mapping from the input space Rn to the feature space F. Without
loss of generality we assume the M vectors P are centered vectors in the feature space, i.e., M t=1
Φ (xt) = 0; uncentered vectors can easily be converted to centered vectors (Scholk opf et
al.,1998).

Furthermore, Kernel PCA Method is significant to minimize reconstruction error in


feature space, for a centered dataset Φ(Χ). Secondly, define U to be the left singular vectors of
Φ(x). Thirdly is equivalently, calculate eigenvectors of the centered dot product matrix (as in
PCA). From the terms of non-linear structure, kPCA can capture nonlinear structure in the data
(if using a nonlinear kernel). Next, in terms of interpretability, no such weights exist for kPCA
with nonlinear kernels because the mapping is nonlinear. It's also nonparametric. In terms of
inverse mapping kPCA doesn't inherently provide an inverse mapping, although it's possible to
estimate one using additional methods (at the cost of extra complexity and computational
resources). In terms of hyperparameters, kPCA methods require choosing the number of
dimensions. kPCA also requires choosing the kernel function and any associated parameters (e.g.
the bandwidth of an RBF kernel, or degree of a polynomial kernel). This choice (and the
method/criterion employed to make it) is problem dependent. Typically, one needs to re-fit kPCA
multiple times to compare different kernel/parameter choices. Lastly, in terms of computational
cost, various strategies exist for scaling up kPCA, but this requires making approximations.

2
3.0 MECHANIZATION SEQUENCE

Sometimes the data structure is nonlinear and can be made linearly separable by
projecting them into high dimension space which makes it linear separable.

First step is to convert the structure data into high dimension space:

And after we plot the new data, we can see it is linearly separable.

We can now extract the Principal Component in the new space using the Kernel Trick.

The previous data structure is in two dimensional:

To get the new 3 dimensional we can use transformation function:

Lastly we got the linear product in 3 dimension:

But we need to remember it is still nonlinear in the original data space (2 dimension).

3
4.0 WHO CAN APPLY KERNEL

Kernel PCA uses a kernel function to project a dataset into a higher dimensional feature
space, where it is linearly separable. It is similar to the idea of Support Vector Machines. Using a
kernel, the originally linear operations of PCA are done in a reproducing kernel Hilbert space
with a non-linear mapping. By the use of integral operator kernel functions, one can efficiently
compute principal components in high-dimensional feature spaces, related to input space by
some nonlinear map. The result will be the set of data points in a non-linearly transformed space.

Application areas of kernel methods are diverse and include geostatistics, 3D


reconstruction, bioinformatics, chemoinformatics, information extraction and handwriting
recognition. Geostatistics is a class of statistics used to analyze and predict the values associated
with spatial or spatiotemporal phenomena. It incorporates the spatial coordinates of the data
within the analyses. The mining industry uses geostatistics for several aspects of a project:
initially to quantify mineral resources and evaluate the project's economic feasibility, then on a
daily basis in order to decide which material is routed to the plant and which is waste, using
updated information as it becomes available.

In computer vision and computer graphics, 3D reconstruction is the process of capturing


the shape and appearance of real objects. The research of 3D reconstruction has always been a
difficult goal. Using 3D reconstruction one can determine any object's 3D profile, as well as
knowing the 3D coordinate of any point on the profile. The 3D reconstruction of objects is a
generally scientific problem and core technology of a wide variety of fields, such as Computer
Aided Geometric Design (CAGD), computer graphics, computer animation, computer vision,
medical imaging, virtual reality, and digital media. For instance, the lesion information of the
patients can be presented in 3D on the computer, which offers a new and accurate approach in
diagnosis and thus has vital clinical value.

Handwriting recognition is the ability of a computer to receive and interpret intelligible


handwritten input from sources such as paper documents, photographs, touch-screens and other
devices. The image of the written text may be sensed "off line" from a piece of paper by optical
scanning. People who use handwriting recognition are usually a police, forensics or lawyer to
gather evidence of criminal offenses or to find out important information.

4
5.0 WEAKNESS

For a standard KPCA, it has some drawbacks which limit its practical applications when
handling big or online datasets. In fact, using a nonlinear kernel may give worse performance
due to overfitting. Next, it is also very time consuming. Rendering the evaluation of KPCA on
large-scale datasets is very time-consuming. The computational complexity for KPCA to extract
principal components takes more time compared to a standard PCA. Another weakness or
limitation of KPCA is that it doesn't inherently provide an inverse mapping. Although it's
possible to estimate one using additional methods (at the cost of extra complexity and
computational resources). From that, it may also require a higher computational cost. PCA
generally has lower memory and runtime requirements than KPCA, and can be scaled to massive
datasets. Various strategies exist for scaling up KPCA, but this requires making approximations.
In the testing stage, the resulting kernel principal components have to be defined implicitly by
the linear expression of the training data, and thus all the training data must be saved after
training. For a massive dataset, this translates into high costs for storage resources and increases
the computational burden during the utilization of kernel principal components (KPCs). Another
point is that the KPCA is impractical for many real-world applications where online samples are
progressively collected since it is used in a batch manner. This implies that each time new data
arrive, KPCA has to be conducted from scratch.

5
6.0 STRONG POINT

The undeniable very first strong point of KPCA is nonlinear structure. Compared to PCA,
KPCA can capture the higher-order statistical information contained in data, thus producing
nonlinear subspaces for better feature extraction performance. Another strong point of KPCA is
hyperparameters. It means KPCA requires choosing the kernel function and any associated
parameters. For example, the bandwidth of an RBF kernel, or degree of a polynomial kernel. The
choice and the method or criteria used to make it depends on the problem. Typically, one needs
to refit KPCA multiple times to compare different kernel or parameter choices. Furthermore,
unlike other common analysis techniques, as with other kernel methods it inherently takes
combinations of predictive features into account when optimizing dimensionality reduction.
Moreover, the reason that permits KPCA to apply stronger generalization biases is its implicit
consideration of combinations of feature information in the data distribution from the
high-dimensional training vectors. In this simplified illustrative example, there are just five input
dimensions; the effect is stronger in more realistic high dimensional vector spaces. Since the
KPCA computes the transform purely from unsupervised training vector data, and extracts
generalizations that are subsequently utilized during supervised classification, it is quite possible
to combine large amounts of unsupervised data with reasonably smaller amounts of supervised
data.

6
7.0 LIST OF APPLICATION

Principal Component Analysis (PCA) is a well-known dimensionality reduction and


multivariate analysis technique. Data compression, image processing, visualisation, exploratory
data analysis, pattern identification, and time series prediction are just a few of its many
applications. PCA's appeal stems from three key characteristics. First, it is the best linear
technique for compressing a set of high-dimensional vectors into a set of lower-dimensional
vectors and subsequently rebuilding the original set (in terms of mean squared error). Second,
model parameters can be derived directly from data, such as by diagonalizing the sample
covariance matrix. Third, given the model parameters, compression and decompression are
simple processes that simply need matrix multiplication. A multi-dimensional hyper-space is
often difficult to visualize. The main objectives of unsupervised learning methods are to reduce
dimensionality, scoring all observations based on a composite index and clustering similar
observations together based on multivariate attributes. Summarizing multivariate attributes by
two or three variables that can be displayed graphically with minimal loss of information is
useful in knowledge discovery. Because it is hard to visualize a multi-dimensional space, PCA is
mainly used to reduce the dimensionality of d multivariate attributes into two or three
dimensions.

Application areas of kernel methods are diverse and include geostatistics, kriging, inverse
distance weighting, 3D reconstruction, bioinformatics, chemoinformatics, information extraction
and handwriting recognition. Kernel PCA has been demonstrated to be useful for novelty
detection and image de-noising. In kernel PCA for novelty detection, the reconstruction error in
feature space is used as a measure for novelty. The following matlab code demonstrates the
reconstruction error for a given 2-D point distribution and shows the resulting decision boundary
enclosing all data points. Download and unzip the file kpca.zip [8 KB]. The zip archive contains
three m-files: kpcabound.m, recerr.m, and kernel.m. The last two are helper functions for the
demo program kpcabound. kpcabound has as input the data set, the kernel parameter (here,
sigma of a Gaussian function), the number of eigenvectors to be extracted, and the number of
points outside the decision boundary.

7
The first application of Kernel PCA to a true natural language processing task. We have
shown that a KPCA-based model can significantly outperform state-of-the-art results from both
naıve Bayes as well as maximum. The fact that our KPCA-based model outperforms the SVM
based model indicates that kernel methods other than SVMs deserve more attention. Given the
theoretical advantages of KPCA, it is our hope that this work will encourage broader recognition,
and further exploration, of the potential of KPCA modelling within NLP research. Given the
positive results, we plan next to combine large amounts of unsupervised data with smaller
amounts of supervised data such as the Senseval lexical sample. One of the promising
advantages of KPCA is that it computes the transform purely from unsupervised training vector
data. We can thus make use of the vast amounts of cheap unannotated data to augment the model
presented in this paper.

8
8.0 REFERENCE

Almaki, M. (2019, February 16). Kernel Principal Component Analysis (KPCA). OpenGenus IQ:
Computing Expertise & Legacy. Retrieved from
https://iq.opengenus.org/kernal-principal-component-analysis/.

Anonymous (2019). Introduction to Kernel PCA. Retrieved from


https://www.geeksforgeeks.org/ml-introduction-to-kernel-pca/

Jade, Avinash & Srikanth, B. & Valadi, Jayaraman & Kulkarni, Bhaskar & Jog, Jyoti & Priya,
Lethcsmy. (2003). Feature extraction and denoising using kernel PCA. Chemical
Engineering Science. 58. 4441-4448. 10.1016/S0009-2509(03)00340-3. Retrieved from
https://scihub.yncjkj.com/10.1016/s0009-2509(03)00340-3

“Kernel PCA for Novelty Detection.” 2021. Heikohoffmann.de. 2021. Retrieved from
https://www.heikohoffmann.de/kpca.html

Wang, Wei, Min Zhang, Dan Wang, and Yu Jiang. 2017. “Kernel PCA Feature Extraction and the
SVM Classification Algorithm for Multiple-Status, Through-Wall, Human Being
Detection.” EURASIP Journal on Wireless Communications and Networking 2017 (1).
Retrieve from https://doi.org/10.1186/s13638-017-0931-2

Wu, D., Su, W., & Carpuat, M. (2004). A kernel PCA method for superior word sense
disambiguation. Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics - ACL ’04. Retrieved from
https://doi.org/10.3115/1218955.1219036

W. Wu, D.L. Massart, S. de Jong (1997). The kernel PCA algorithms for wide data. Part I:
Theory and algorithms. Chemometrics and Intelligent Laboratory Systems, 36 (2),
165-172. Retrieved from https://scihub.yncjkj.com/10.1016/s0169-7439(97)00010-5

You might also like