1
$\begingroup$

I have $x \in \mathbb{R}^{d\times 1}$ and $y\in \mathbb{R}^{p\times d}$ and I want to minimize the function $f(x,y) = g(x) + x^T y^T y x$ such that $y^T y$ is a positive definite matrix and $g$ is a strongly convex function.

To this end, I use alternating gradient descent,i.e., at a given iteration $k+1$, the updates are: $$ x^{(k+1)} = x^{(k)} - 2 \eta_x (\nabla g(x^{(k)}) + (y^{(k)})^T y^{(k)} x^{(k)}) $$ $$ y^{(k+1)} = y^{(k)} - 2 \eta_y y^{(k)} x^{(k+1)} (x^{(k+1)})^T $$

My understanding is that each step will lead to a decrease in a certain direction. That is why I want to show that these updates lead to $f(x^{k+1},y^{k+1}) \leq f(x^{k+1},y^k)$.

Attempt:

Using the update, the difference can be written as \begin{align} &f(x^{k+1},y^{k+1}) - f(x^{k+1},y^{k})\\ & = 4 \eta_y^2 (x^{(k+1)})^T x^{(k+1)} (x^{(k+1)})^T (y^{(k)})^T y^{(k)} x^{(k+1)} (x^{(k+1)})^T x^{(k+1)} - 4 \eta_y (y^{(k)})^T y^{(k)} x^{(k+1)} (x^{(k+1)})^T x^{(k+1)}\\ &= 4 \eta_y^2 \|x^{(k+1)}\|^4 \|y^{(k)} x^{(k+1)}\|^2 - 4 \eta_y \|y^{(k)} x^{(k+1)}\|^2 \|x^{(k+1)}\|^2. \end{align} Studying this expression as a function of $\eta_y$ gives me the condition $\eta_y \leq \frac{1}{\|x^{(k+1)}\|^2}$ to ensure that the expression is non-positive. Is this condition reasonable?

$\endgroup$
9
  • 1
    $\begingroup$ It has no minimum, you have $f >0$ but $\inf f = 0$. $\endgroup$
    – copper.hat
    Commented Dec 4, 2023 at 18:26
  • 1
    $\begingroup$ $y^T y$ is a value, did you mean $x^T y y^T x$ ? If so then you just need to show that $f(x^{k+1}, y)$ is convex in $y$ and you also have to be careful about $\eta_{y}$, you would need some assumption such as smoothness to prove that. $\endgroup$
    – P. Quinton
    Commented Dec 5, 2023 at 8:48
  • 1
    $\begingroup$ you can read this : epfl.ch/labs/anchp/wp-content/uploads/2018/05/part3a.pdf The thing is that your descent step might overshoot bring you to a higher point, think of the $f(x)=|x|$ function if you are at the point $x_0=-\eta/4$, then doing the gradient step $x_0 - \eta c\dot \partial f(x_0)$, then you get at $x_1 = -3/4 \eta$ and $f(x_0) < f(x_1)$. Smoothness avoids that by saying that the function "looks similar enough" to a quadratic function. $\endgroup$
    – P. Quinton
    Commented Dec 5, 2023 at 9:29
  • 1
    $\begingroup$ I also believe your function is smooth in $y$ for a fixed $x$, so you only need to prove it for $y$. You can do that by observing that $x^T y^T y x=\operatorname{Tr}(y x x^T y^T)$ and therefore using a bit of matrix calculus $\nabla_y f(x, y) = 2 y x x^T$, it is then easy to show that $\| \nabla_y f(x, a) -\nabla_y f(x, b) \|_{\text{F}} = 2\| (a-b) x x^T \|_{\text{F}}\leq 2 \| x x^T \|_{\text{F}} \cdot \| a-b \|_{\text F}=2 x^T x \| a-b\|_{\text F}$. Therefore your function is actually $2 x^T x$-smooth. $\endgroup$
    – P. Quinton
    Commented Dec 5, 2023 at 14:04
  • 2
    $\begingroup$ For proving that $f(x_{k+1}, y_{k+1})\leq f(x_{k+1}, y_k)$ yes. The problem here is that your problem is biconvex and you are doing alternating biconvex optimization, this is not guarantied to converge and I think it may not in general have a minimum (but maybe your does, I don't know). $\endgroup$
    – P. Quinton
    Commented Dec 6, 2023 at 7:54

0

You must log in to answer this question.