Timeline for alternating gradient descent on quadratic function
Current License: CC BY-SA 4.0
13 events
when toggle format | what | by | license | comment | |
---|---|---|---|---|---|
Dec 6, 2023 at 14:53 | history | edited | Ki Chao | CC BY-SA 4.0 |
added 9 characters in body
|
Dec 6, 2023 at 7:54 | comment | added | P. Quinton | For proving that $f(x_{k+1}, y_{k+1})\leq f(x_{k+1}, y_k)$ yes. The problem here is that your problem is biconvex and you are doing alternating biconvex optimization, this is not guarantied to converge and I think it may not in general have a minimum (but maybe your does, I don't know). | |
Dec 5, 2023 at 19:55 | comment | added | Ki Chao | @P.Quinton: thanks for the further explanation. My issue is would it be fine that the Lipschitz constant in this case depend on the other variable $x$? | |
Dec 5, 2023 at 14:04 | comment | added | P. Quinton | I also believe your function is smooth in $y$ for a fixed $x$, so you only need to prove it for $y$. You can do that by observing that $x^T y^T y x=\operatorname{Tr}(y x x^T y^T)$ and therefore using a bit of matrix calculus $\nabla_y f(x, y) = 2 y x x^T$, it is then easy to show that $\| \nabla_y f(x, a) -\nabla_y f(x, b) \|_{\text{F}} = 2\| (a-b) x x^T \|_{\text{F}}\leq 2 \| x x^T \|_{\text{F}} \cdot \| a-b \|_{\text F}=2 x^T x \| a-b\|_{\text F}$. Therefore your function is actually $2 x^T x$-smooth. | |
Dec 5, 2023 at 10:57 | comment | added | Ki Chao | @P.Quinton: thanks for the reference and explanation. At least in my case, I think the function is smooth and convex in $y$. Do you imply I need it to be smooth also in $x$? | |
Dec 5, 2023 at 9:29 | comment | added | P. Quinton | you can read this : epfl.ch/labs/anchp/wp-content/uploads/2018/05/part3a.pdf The thing is that your descent step might overshoot bring you to a higher point, think of the $f(x)=|x|$ function if you are at the point $x_0=-\eta/4$, then doing the gradient step $x_0 - \eta c\dot \partial f(x_0)$, then you get at $x_1 = -3/4 \eta$ and $f(x_0) < f(x_1)$. Smoothness avoids that by saying that the function "looks similar enough" to a quadratic function. | |
Dec 5, 2023 at 9:08 | comment | added | Ki Chao | @P.Quinton: $y$ here is a matrix. Could you explain more about the smoothness assumptions? | |
Dec 5, 2023 at 8:48 | comment | added | P. Quinton | $y^T y$ is a value, did you mean $x^T y y^T x$ ? If so then you just need to show that $f(x^{k+1}, y)$ is convex in $y$ and you also have to be careful about $\eta_{y}$, you would need some assumption such as smoothness to prove that. | |
Dec 5, 2023 at 8:33 | history | edited | Ki Chao | CC BY-SA 4.0 |
added condition on $\eta_y$
|
Dec 4, 2023 at 19:14 | comment | added | Ki Chao | @copper.hat: I updated the question as I seem to forgot a term in the definition of $f$. | |
Dec 4, 2023 at 19:13 | history | edited | Ki Chao | CC BY-SA 4.0 |
updated the definition of $f$
|
Dec 4, 2023 at 18:26 | comment | added | copper.hat | It has no minimum, you have $f >0$ but $\inf f = 0$. | |
Dec 4, 2023 at 16:30 | history | asked | Ki Chao | CC BY-SA 4.0 |