Newest 'kernel-trick' Questions

0 votes

0 answers

11 views

When running a Support Vector Machine, how do I formulate the linear transformation that flips the decision hyperplane in the non-augmented dimension?

We know that when running a support vector machine, we actually use the "kernel trick" to compute the decision hyperplane (boundary) as if we do so in the kernel-augmented dimension, but not ...

Wonjae Oh

1

asked Oct 23 at 1:24

0 votes

0 answers

92 views

Proof: The Gaussian Kernel as an Inner Product in Infinite-Dimensional Feature Space

Prove that the Gaussian kernel on $ \mathbb{R}^d $ for a positive integer $d $: \begin{equation} k(x, x') = \exp(-\gamma \|x - x'\|^2) \tag{1} \end{equation} for $\gamma > 0 $, can be expressed as ...

Lifeni

313

asked Sep 21 at 9:38

0 votes

0 answers

152 views

Prove matrix constructed based on gaussian RBF is PSD

I have a radial basis function $k(x, y) = \exp(-{(x-y)}^T M {(x-y)})$ where $M$ is a symmetric PSD matrix. I know that $k(\cdot)$ is a kernel itself: Prove that multiplication with positive ...

BiriBora

101

asked Sep 16 at 22:28

0 votes

0 answers

7 views

Has natural language processing/generation been attempted with kernal methods

I am curious as to whether there has been much success in the past with applying kernel methods to perform natural language processing/generation? Rather than just numerical evidence, I'm particularly ...

N A McMahon

101

asked Aug 15 at 15:18

3 votes

0 answers

34 views

Convergence of kernel mean embeddings

Let $k(\cdot,\cdot)$ be a bounded kernel and $\mathcal{H}$ its associated RKHS. Define the kernel mean embedding $\mu=\int k(\cdot,x) \, dP_X(x)$ and let $\hat{\mu}=\frac{1}{n}\sum k(x_i,\cdot)$ be ...

xcesc

122

asked Aug 2 at 14:32

0 votes

0 answers

19 views

How to get the second stationary point condition corresponding to intercept when using the augmented weight vector and augmented design matrix in SVM?

Below is the formulation I got for SVM when using the equation of classifier as w.x + b = 0 I want to know why I am not getting the second stationary condition i.e. summation over i from 1 to n of (...

Shri

23

asked Jul 23 at 6:12

3 votes

2 answers

131 views

In the contex of Kernel regression why do we define the feature map as equal to the Kernel $\varphi(x)=k(\cdot ,x)$?

I have a notational confusion I am trying to clear up. In the context of Kernel regression the following relationship between the kernel and the feature map is defined: Consider a positive-definite ...

Monolite

1,465

asked Jun 12 at 20:23

1 vote

0 answers

40 views

Derivation of dual formulation of support vector regression

I'm trying to derive the dual formulation of epsilon-insensitive support vector regression. I think my derivation is correct, but I can't match it up to a result for the dual that I've seen given in ...

oweydd

225

asked Jun 10 at 13:08

1 vote

0 answers

37 views

What is the best way to use Gaussian Processes to approximate highly non-stationary functions?

Gaussian process regression has trouble approximating functions with "kinks". So, what is the most widely used method to deal with this problem? I have found many proposed methods, including ...

Dan Zhao

43

asked May 30 at 1:31

3 votes

0 answers

64 views

What are the "tricks" in machine learning? [closed]

I have come across a few different "tricks" in machine learning methodology, which I list below along with my rudimental understandings. The Kernel Trick: This is used in Support Vector ...

camhsdoc

409

asked May 25 at 13:07

0 votes

1 answer

32 views

SVM Kernel to compare histograms as input vectors

In lecture 7 of CS229 by Andrew Ng he mentions at the very end a specific Kernel that allows an SVM to "classify" how similar two histograms are, such as the demographics of 2 countries. He ...

yyyLLL

33

asked May 18 at 23:32

0 votes

0 answers

26 views

Using MMD for Feature Selection with Linear Regression: Valid Approach?

I'm using Maximum Mean Discrepancy (MMD) for feature selection (i.e., to select the features that minimize the dissimilarity between the training and testing datasets). I'm aware that MMD introduces ...

Adham Enaya

135

asked Apr 22 at 22:02

0 votes

0 answers

43 views

Covariance inversion for Gaussian process

Background Let $x=f(u_x)\in\mathbb{R}$ and let $y=[f(u_y^1)\cdots f(u_y^{N})]\in\mathbb{R}^N$ for some function $f:u \in \mathbb{R}\mapsto \mathbb{R}$. Given $y$, $u_x$, $u_{y}^1,\dots, u_{y}^{N}$, I ...

matteogost

473

asked Apr 14 at 5:07

2 votes

0 answers

24 views

How to find K in kernel trick?

How does one go about finding the kernel when using the so-called "kernel trick?" Here is an example from quora: Simple Example: x = (x1, x2, x3); y = (y1, y2, y3). Then for the function f(...

Hank

21

asked Apr 9 at 17:44

0 votes

0 answers

9 views

Computing Test Loss in Kernel Ridge Regression

In Kernel Ridge regression we have the standard loss function $$L(\beta) = \|Y-K\beta\|_2^2 + \alpha \beta^T K \beta$$ Here, $K$ is the kernel (gram) matrix. If I compute $\beta$ on a training set, so ...

WeakLearner

1,531

asked Mar 22 at 0:18

0 votes

0 answers

14 views

Estimation of bivariate function with one variable being constricted

Suppose the following classical supervised regression setting, $$y_{i} = f(x_{i}) + \epsilon_{i}, \quad i=1,\cdots,n,$$ where $\epsilon_{i}$ are i.i.d. zero mean Gaussian noise. The above regression ...

DoubleL

11

asked Mar 16 at 4:09

0 votes

0 answers

23 views

Can I find the explicit feature map that generates exponent of a kernel?

Let's say I have a kernel $K$, and another kernel of the form : $$ K' = e^K $$ now I know how to prove K' is a kernel, I can do it using taylor expansion of $e^x$ around $0$, but let's say if I want ...

aroma

123

asked Mar 8 at 17:22

2 votes

1 answer

42 views

Does solution to ridge regression still minimizes the cost function when lambda is <=0?

This was a homework problem where I was asked to find explicit expression that minimises the cost function. I found the solution as : $\hat{\theta} = (X^TX + \lambda I)^{-1}X^Ty$ Now the problem ...

aroma

123

asked Mar 7 at 6:12

0 votes

0 answers

10 views

What is normalized winning frequency in kernel self organizing map(SOM)?

In the k-means based kernel SOM, proposed by MacDonald and Fyfe (2000), the update of the mean is based on a soft learning algorithm mi(t + 1) = mi(t) + Λ[φ(x) − mi(t)] where Λ is the normalized ...

Anshuman Jayaprakash

1

asked Mar 1 at 10:51

0 votes

0 answers

28 views

theoretical question: why is RBF the 'best' kernel

I am trying to understand why the RBF kernel is usually used in many research papers doing kernel tricks. To reduce the scope, we can focus on linear regression (thus effectively, increasing the ...

cgo

9,317

asked Feb 29 at 9:12

0 votes

0 answers

23 views

normalized dual activation function for neural tangent kernel

Let $\phi$ be an activation function. In this lecture note, The author assumes that the dual activation function, denoted as $\check{\phi}$ is normalized such that $\check{\phi}(1)=1$. How can it be ...

MohammadJavad Vaez

101

asked Feb 28 at 11:44

2 votes

0 answers

61 views

How is the weight vector calculated when using kernel trick for ridge regression

Im trying to understand how kernelized ridge regression works, and how we manage to first transform, and subsequently learn on higher-dimensional features without explicitly having to calculate them. ...

pyrrosk

33

asked Feb 8 at 18:23

0 votes

0 answers

58 views

How to use random kitchen sinks for $\sigma \neq 1$?

The RBF kernel is given by $$ k(x,y) = \exp\left(-\frac{\| x - y \|_2^2}{2 \sigma^2}\right) $$ where $\sigma$ is the length-scale parameter. I want to use the random kitchen sinks method to create a ...

user336650

11

asked Jan 25 at 20:33

1 vote

0 answers

30 views

RKHS inclusion relationship of the Erf network's NTK

In the referenced paper, it is stated that for ReLU networks, the Reproducing Kernel Hilbert Space (RKHS) of the Neural Tangent Kernels (NTK) remains unchanged regardless of the model's depth. I am ...

user376649

21

asked Jan 7 at 13:01

8 votes

2 answers

192 views

Under what kernels and/or conditions does $k(x, x) = k(x, X) k(X, X)^{-1} k(X, x)$?

This question is motivated by a question I'm facing in vector-valued kernel methods (also known as Gaussian Processes and co-krieging). Suppose I have $N$ data $X := \{x_n\}_{n=1}^N$ , where each $x_n ...

Rylan Schaeffer

1,057

asked Jan 5 at 19:27

3 votes

0 answers

38 views

Exchanging integrals with inner products with kernel mean embeddings

I am doing some reading on kernel mean embeddings. In particular I am reading the survey paper by Muandet et al. On page 27 (Section 3.1) the authors begin a gentle introduction to kernel mean ...

Nick Bishop

131

asked Jan 3 at 17:03

3 votes

1 answer

147 views

Dual form of the least square solution (ridge rigression)

I was reading this introductory material and on the 5th page, it describes the dual form of the least-square solution (with ridge regression) as $$A(aI + A^\top A)^{-1} = (aI + AA^\top)^{-1}A$$ for a $...

Alemu

125

asked Dec 30, 2023 at 14:03

3 votes

0 answers

130 views

Clarifying the difference between various regression methods called "kernel' or "Bayesian"

I want to understand the pairwise relationship between four types of regression: Bayesian Linear Regression, Gaussian Process Regression, Kernel Regression (Nadaraya-Watson), and Kernel Ridge ...

Tanishq Kumar

203

asked Dec 24, 2023 at 8:26

0 votes

0 answers

27 views

Calculating the Orthogonal Distance to Kernel PCA subspace (with a new data)

I am studying Kernel PCA methods and now I'm trying to calculate orthogonal distances (OD) on the feature space. What I've found is, you can calculate ODs with a kernel trick if you are interested in ...

cccanhakan

1

asked Dec 18, 2023 at 16:12

2 votes

1 answer

41 views

Interpreting the formula for Riemannian metric tensor

In Improving support vector machine classifiers by modifying kernel functions, the authors defined Riemannian metric tensor for a kernel as follows: $$ \begin{align} g(\vec{x}) &= \text{det}|g_{ij}...

Omar Shehab

95

asked Dec 12, 2023 at 18:48

1 vote

0 answers

31 views

Why is the concept of RKHS useful in kernel ridge regression?

The way I have seen kernel ridge regression introduced is as follows. Given data $(X,Y)$ you want to fit a function $f$ from a RKHS $\mathcal{H}$ to minimise some empirical loss $\sum_i L(f(x_i), y_i)$...

Danny Duberstein

121

asked Dec 6, 2023 at 12:55

1 vote

0 answers

133 views

Weighted sum of RBF kernels with different length scales

When applying Gaussian Processes to applied problems, the choice of length-scale parameter parameter for the radial basis function (RBF, ie Gaussian) kernel makes a big difference. In practice, I have ...

Betterthan Kwora

350

asked Nov 13, 2023 at 21:04

3 votes

2 answers

156 views

Is it enough to prove that the Kernel matrix is positive semidefinite to know that the function is a kernel?

Is it enough to prove that the Kernel matrix is positive semidefinite to know that the function is a kernel? Or is it also necessary to prove that the matrix is symmetric?

winnie

31

asked Nov 9, 2023 at 23:48

1 vote

0 answers

120 views

Geometric intuition of kernel trick

I would like to understand better the geometry underlying the Kernel trick with the Gaussian Kernel. In particular my question is: How the Kernel trick can be interpreted geometrically, in particular ...

Thomas

952

asked Aug 9, 2023 at 14:09

1 vote

1 answer

60 views

How to project kernel PCA?

I have an $m\times n$ matrix $X$. To apply a Kernel PCA to my $X$ matrix I need to warp it into a function $K = \Phi(X)$. The problem here is that $K$ get the size $m \times m$. If I'm doing ...

euraad

425

asked Aug 5, 2023 at 12:35

0 votes

1 answer

29 views

Result after applying kernel trick

I understand when the data is not linearly separable, it has to transformed into higher dimensional space, to make it linearly separable. Applying kernel trick can perform it without even computing ...

mainak mukherjee

23

asked Jul 29, 2023 at 4:44

1 vote

0 answers

38 views

is it possible to use RBF sampler to construct kernel and use it for prediction at new data point?

I would like to construct a kernel from very large samples which makes it impossible to construct the N by N kernel matrix. I can use RBF sampler (random fourier features) to make the dimension more ...

W Jin

11

asked Jul 25, 2023 at 20:53

1 vote

0 answers

67 views

Why do we need $a:\mathcal{X} \to \mathbb{R}$ to be positive here?

This is exercise 6.1 from the book Foundations of Machine Learning: Let $K: \mathcal{X}\times \mathcal{X} \to \mathbb{R}$ be a PDS kernel, and let $a: \mathcal{X}\to \mathbb{R}$ be a positive ...

George Giapitzakis

156

asked Jul 19, 2023 at 8:22

2 votes

0 answers

99 views

Rescaling matrix W in Random Fourier Features

I came across this beautiful idea of Random Fourier Features by Rahimi and Recht while working on optimising my GP model using Predictive Entropy Search. I understand the overall idea of approximating ...

Ann

43

asked Jul 5, 2023 at 20:43

5 votes

1 answer

1k views

Why does a valid Kernel only have to be positive semi-definite instead of positive definite?

I'm currently concerned with the topic of Gaussian Processes. To compute the covariance matrix of the conditional distribution, we have to invert $(K_{XX})^{-1}$, where $K_{XX}$ is a matrix of a ...

rodeo

53

asked May 11, 2023 at 8:34

3 votes

0 answers

97 views

Understanding the ridge leverage scores sampling from an arXiv paper

I give a try to read the arXiv paper Distributed Adaptive Sampling for Kernel Matrix Approximation, Calandriello et al. 2017. I got a code implementation where they compute ridge leverage scores ...

Emon Hossain

205

asked Apr 26, 2023 at 18:47

1 vote

1 answer

673 views

Prove that 2nd order polynomial kernel is positive semi-definite

I'm trying to prove that the 2nd order polynomial kernel, $K(x_i, x_j) = (x_i^Tx_j + 1)^2$ is a valid kernel which satisfies the following conditions: K is symmetric, that is, $K(x_i, x_j) = K(x_j, ...

Muhteva

113

asked Apr 7, 2023 at 15:55

1 vote

0 answers

305 views

How to properly implement a Matérn kernel function in R?

This definition is excerpted from Wikipedia: The Matérn covariance between measurements taken at two points separated by d distance units is given by $$C_\nu(d) = \sigma^2\frac{2^{1-\nu}}{\Gamma(\nu)}\...

Miles N.

184

asked Apr 5, 2023 at 1:10

1 vote

0 answers

15 views

Given a psd matrix $Q$ and a kernel function $f(y_i, y_j)$, how do I find $Y \in \mathbb{R}^{n \times d}$ that best approximates $Q$? [duplicate]

The question is basically the title. I have a matrix $Q$ that I know is positive semi-definite. I now want to find the $Y$ that approximates this matrix under some kernel function $f(y_i, y_j)$. I ...

Andrew Draganov

111

asked Mar 28, 2023 at 14:30

1 vote

1 answer

314 views

Non-stationary Random Fourier Features

Random Fourier Features (RFFs) were introduced by A. Rahimi and B. Recht in their 2007 publication Random Features for Large-Scale Kernel Machines. RFFs are based on Bochner's theorem, which applies ...

LoveRKHS

51

asked Mar 18, 2023 at 14:01

2 votes

0 answers

41 views

Identifiability of models on RKHS

I have just started learning about using reproducing kernel hilbert spaces for regularisation in machine learning. I am looking for some examples of reproducing kernels that produce identifiable and ...

Codie

51

asked Mar 13, 2023 at 13:51

0 votes

0 answers

65 views

Feature maps of the chi-squared kernel

The additive chi-squared kernel for histograms is defined as $$K(x,y)= \sum_{i=1}^n \frac{2x_i y_i}{x_i + y_i}$$ Is this kernel positive definite on histograms? And if so, is there a known expression ...

Claudio Moneo

318

asked Mar 10, 2023 at 8:55

0 votes

0 answers

85 views

Method of evaluating the feature map of a polynomial kernel feature mapping

I'm attempting to implement an adaptive kernel Kalman filter following this paper https://arxiv.org/abs/2203.08300, but I'm struggling to find a method of evaluating the feature mapping for a ...

esatemporis

1

asked Mar 8, 2023 at 15:26

3 votes

0 answers

68 views

Is the transformation implied by a positive-type kernel well-defined?

I’ve been trying to get my head around the particularity of the Hilbert space that a positive-type (equiv. positive definite) kernel represents an inner product on, and was hoping for some help in ...

demim00nde

349

asked Feb 27, 2023 at 11:59

2 votes

0 answers

25 views

In Gaussian Process Regression, what kinds of information can you not put in the kernel as opposed to the mean?

For example, suppose you want to learn some structure for the mean and then you also have some kernel. Is is sometimes not possible to put most things in the kernel? For example, consider ...

safetyduck

322

asked Feb 23, 2023 at 13:07

Questions tagged [kernel-trick]

Related Tags