Report Kernel Pca Method
Report Kernel Pca Method
Report Kernel Pca Method
MEE3033
GROUP 3
LECTURER'S NAME:
NO TOPIC PAGE
1 INTRODUCTION 1
2 SIGNIFICANT OF KERNEL 2
3 MECHANIZATION SEQUENCE 3
5 WEAKNESS 5
6 STRONG POINT 6
7 LIST OF APPLICATION 7
8 REFERENCE 9
1.0 INTRODUCTION
1
2.0 SIGNIFICANT OF KERNEL
The significance of the Kernel PCA approach is that, unlike other frequent analytic
techniques, it intrinsically considers combinations of predictive characteristics while optimising
dimensionality reduction, as do other kernel methods. Of course, large accuracy advances may
typically be realised by generalising over relevant feature combinations for natural language
issues in general (e.g., Kudo and Matsumoto) (2003). Another advantage of KPCA for the WSD
task is that the dimensionality of the input data is generally very large, a condition where kernel
methods excel. Nonlinear principal components (Diamantaras and Kung, 1996) may be defined
as follows. For example, suppose we are given a training set of M pairs (xt, ct) where the
observed vectors xt ∈ Rn in an n-dimensional input space X represent the context of the target
word being disambiguated, and the correct class ct represents the sense of the word, for t = 1, ..,
M. Suppose Φ is a nonlinear mapping from the input space Rn to the feature space F. Without
loss of generality we assume the M vectors P are centered vectors in the feature space, i.e., M t=1
Φ (xt) = 0; uncentered vectors can easily be converted to centered vectors (Scholk opf et
al.,1998).
2
3.0 MECHANIZATION SEQUENCE
Sometimes the data structure is nonlinear and can be made linearly separable by
projecting them into high dimension space which makes it linear separable.
First step is to convert the structure data into high dimension space:
And after we plot the new data, we can see it is linearly separable.
We can now extract the Principal Component in the new space using the Kernel Trick.
But we need to remember it is still nonlinear in the original data space (2 dimension).
3
4.0 WHO CAN APPLY KERNEL
Kernel PCA uses a kernel function to project a dataset into a higher dimensional feature
space, where it is linearly separable. It is similar to the idea of Support Vector Machines. Using a
kernel, the originally linear operations of PCA are done in a reproducing kernel Hilbert space
with a non-linear mapping. By the use of integral operator kernel functions, one can efficiently
compute principal components in high-dimensional feature spaces, related to input space by
some nonlinear map. The result will be the set of data points in a non-linearly transformed space.
4
5.0 WEAKNESS
For a standard KPCA, it has some drawbacks which limit its practical applications when
handling big or online datasets. In fact, using a nonlinear kernel may give worse performance
due to overfitting. Next, it is also very time consuming. Rendering the evaluation of KPCA on
large-scale datasets is very time-consuming. The computational complexity for KPCA to extract
principal components takes more time compared to a standard PCA. Another weakness or
limitation of KPCA is that it doesn't inherently provide an inverse mapping. Although it's
possible to estimate one using additional methods (at the cost of extra complexity and
computational resources). From that, it may also require a higher computational cost. PCA
generally has lower memory and runtime requirements than KPCA, and can be scaled to massive
datasets. Various strategies exist for scaling up KPCA, but this requires making approximations.
In the testing stage, the resulting kernel principal components have to be defined implicitly by
the linear expression of the training data, and thus all the training data must be saved after
training. For a massive dataset, this translates into high costs for storage resources and increases
the computational burden during the utilization of kernel principal components (KPCs). Another
point is that the KPCA is impractical for many real-world applications where online samples are
progressively collected since it is used in a batch manner. This implies that each time new data
arrive, KPCA has to be conducted from scratch.
5
6.0 STRONG POINT
The undeniable very first strong point of KPCA is nonlinear structure. Compared to PCA,
KPCA can capture the higher-order statistical information contained in data, thus producing
nonlinear subspaces for better feature extraction performance. Another strong point of KPCA is
hyperparameters. It means KPCA requires choosing the kernel function and any associated
parameters. For example, the bandwidth of an RBF kernel, or degree of a polynomial kernel. The
choice and the method or criteria used to make it depends on the problem. Typically, one needs
to refit KPCA multiple times to compare different kernel or parameter choices. Furthermore,
unlike other common analysis techniques, as with other kernel methods it inherently takes
combinations of predictive features into account when optimizing dimensionality reduction.
Moreover, the reason that permits KPCA to apply stronger generalization biases is its implicit
consideration of combinations of feature information in the data distribution from the
high-dimensional training vectors. In this simplified illustrative example, there are just five input
dimensions; the effect is stronger in more realistic high dimensional vector spaces. Since the
KPCA computes the transform purely from unsupervised training vector data, and extracts
generalizations that are subsequently utilized during supervised classification, it is quite possible
to combine large amounts of unsupervised data with reasonably smaller amounts of supervised
data.
6
7.0 LIST OF APPLICATION
Application areas of kernel methods are diverse and include geostatistics, kriging, inverse
distance weighting, 3D reconstruction, bioinformatics, chemoinformatics, information extraction
and handwriting recognition. Kernel PCA has been demonstrated to be useful for novelty
detection and image de-noising. In kernel PCA for novelty detection, the reconstruction error in
feature space is used as a measure for novelty. The following matlab code demonstrates the
reconstruction error for a given 2-D point distribution and shows the resulting decision boundary
enclosing all data points. Download and unzip the file kpca.zip [8 KB]. The zip archive contains
three m-files: kpcabound.m, recerr.m, and kernel.m. The last two are helper functions for the
demo program kpcabound. kpcabound has as input the data set, the kernel parameter (here,
sigma of a Gaussian function), the number of eigenvectors to be extracted, and the number of
points outside the decision boundary.
7
The first application of Kernel PCA to a true natural language processing task. We have
shown that a KPCA-based model can significantly outperform state-of-the-art results from both
naıve Bayes as well as maximum. The fact that our KPCA-based model outperforms the SVM
based model indicates that kernel methods other than SVMs deserve more attention. Given the
theoretical advantages of KPCA, it is our hope that this work will encourage broader recognition,
and further exploration, of the potential of KPCA modelling within NLP research. Given the
positive results, we plan next to combine large amounts of unsupervised data with smaller
amounts of supervised data such as the Senseval lexical sample. One of the promising
advantages of KPCA is that it computes the transform purely from unsupervised training vector
data. We can thus make use of the vast amounts of cheap unannotated data to augment the model
presented in this paper.
8
8.0 REFERENCE
Almaki, M. (2019, February 16). Kernel Principal Component Analysis (KPCA). OpenGenus IQ:
Computing Expertise & Legacy. Retrieved from
https://iq.opengenus.org/kernal-principal-component-analysis/.
Jade, Avinash & Srikanth, B. & Valadi, Jayaraman & Kulkarni, Bhaskar & Jog, Jyoti & Priya,
Lethcsmy. (2003). Feature extraction and denoising using kernel PCA. Chemical
Engineering Science. 58. 4441-4448. 10.1016/S0009-2509(03)00340-3. Retrieved from
https://scihub.yncjkj.com/10.1016/s0009-2509(03)00340-3
“Kernel PCA for Novelty Detection.” 2021. Heikohoffmann.de. 2021. Retrieved from
https://www.heikohoffmann.de/kpca.html
Wang, Wei, Min Zhang, Dan Wang, and Yu Jiang. 2017. “Kernel PCA Feature Extraction and the
SVM Classification Algorithm for Multiple-Status, Through-Wall, Human Being
Detection.” EURASIP Journal on Wireless Communications and Networking 2017 (1).
Retrieve from https://doi.org/10.1186/s13638-017-0931-2
Wu, D., Su, W., & Carpuat, M. (2004). A kernel PCA method for superior word sense
disambiguation. Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics - ACL ’04. Retrieved from
https://doi.org/10.3115/1218955.1219036
W. Wu, D.L. Massart, S. de Jong (1997). The kernel PCA algorithms for wide data. Part I:
Theory and algorithms. Chemometrics and Intelligent Laboratory Systems, 36 (2),
165-172. Retrieved from https://scihub.yncjkj.com/10.1016/s0169-7439(97)00010-5