3
$\begingroup$

Good morning everyone,

I'm not a mathematician (I'm an economist) but I found myself struggling with a static optimization problem that appears difficultly solvable with my limited knowledge on the topic. I would be very grateful if you could recommand me some readings or materials to solve this type of issues.

I have an objective function, let it be $\mathcal{O} : \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}^n$. The variable of this application is $\alpha \in \mathbb{R}^n \times \mathbb{R}^n$ (or $\alpha \in \mathcal{M}_n(\mathbb{R}))$. I use the following notations: $\alpha = (\alpha_{ij}) = (\alpha_{1}, ..., \alpha_{j}, ... \alpha_{n})$ (in the latter, $\alpha_j$ is the column j of $\alpha$. I denote $\mathcal{O}_i$ the ith-component of $\mathcal{O}(\alpha)$.

For $(s, \gamma, \delta) \in \mathbb{R}^n \times \mathbb{R}^+ \times \mathbb{R}^+$ some parameters, $$ \mathcal{O}_i = \sum^n_j s_j\alpha_{ij} - \sum^n_j \frac{\alpha_{ij}^\gamma}{1-\alpha_{ij}^\gamma} - \left(\sum^n_j \alpha_{ij}\right)^\delta$$

I want to find $\text{argmax}_\alpha\mathcal{O}$. To do so, I computed my first order condition, that is the partial derivative: $$\frac{\partial \mathcal{O}_i}{\partial \alpha_{ij}} = s_j - \frac{\gamma\alpha_{ij}^{\gamma-1}}{(1-\alpha_{ij}^\gamma)^2} - \delta \left(\sum^n_k \alpha_{ik}\right)^{\delta-1} $$

Assume (to stick to the point of this question) that all existence conditions are satified, that is that all parameters allow the existence of the partial derivative, and the problem has at least one solution.

In my lectures in optimization, I was taught to compute the first order conditions and to solve the system to find optimum values. In this case, I find myself unable to do so because of the term $\frac{\gamma\alpha_{ij}^{\gamma-1}}{(1-\alpha_{ij}^\gamma)^2}$. I tried to rewrite the problem in full matrix notation, but I struggle with the same term and I don't really know if it could really help me to do so.

What would you do in my situation? Is there a methodology that I'm not aware of? I thank you in advance for all the tips you could give me!

$\endgroup$
3
  • 2
    $\begingroup$ Observation: in the derivative you use $j$ as both a free variable and an index variable. This risks confusion. Question: are able to say anything about the numerical values? For example are the $\alpha$s much smaller than 1? are $\gamma$ and $\delta$ likely to be small? Or large? My guess is that you won't get an exact solution and will need to do something approximate or numerical. Final observation: the $i$ parameter doesn't really come into the discussion - you seem to have n separate optimisations across each of the $\alpha_i$. $\endgroup$
    – Blitzer
    Commented Jun 9 at 10:12
  • $\begingroup$ Thank you @Blitzer for the answer. There is indeed a typo in the derivative (the sum should not be on j, I edit)! Ideally, we are considering that every $\alpha_{ij}$ should be in $(0,1)$. For the paramaters $\gamma, \delta$ we are typically working with values in [2, 18] but this interval has not been fixed for now. Have you any material to suggest for approximate or numerical solutions? $\endgroup$
    – Goug
    Commented Jun 9 at 10:27
  • $\begingroup$ I don't know if this is helpful (I am not at all expert in this) but you could try it. Assume that $X=\sum_k \alpha_{ik}$ is a constant and guess the value (say $X=n/2$). Then setting the derivative to zero you have a simple (ish) equation in $\alpha_{ij}$ which you can solve numerically for value of each j. Add up the $\alpha_{ij}$ to find a better value of X and then repeat the exercise until it converges. $\endgroup$
    – Blitzer
    Commented Jun 9 at 10:52

1 Answer 1

0
$\begingroup$

Define a few variables and their differentials $$\eqalign{ \def\d{\delta} \def\g{\gamma} \def\e{e_i} \def\EE{E_{ii}} \def\A{A_{ij}} \def\E{E_{ij}} \def\M{M_{ij}} \def\D{\operatorname{Diag}} \def\s{\oslash} \def\q{\quad} \def\qiq{\q\implies\q} \def\p{\partial} \def\o{{\tt1}} \def\grad#1{\frac{\p #1}{\p\A}} B &= A^\g &\qiq dB = \g A^{\g-\o}\odot dA &= \g A^{\g-\o}\odot\E \\ g &= A\o &\qiq dg=dA\:\o &= \E\o \:=\: \e\\ C &= B\s(J-B) &\qiq dC=H\odot dB &= M\odot\E \\ H &= (J+C)\s(J-B) \\ M &= \g A^{\g-\o}\odot H \\ }$$ where $(J,\o)$ are the all-ones matrix/vector, $\,(\odot,\s)$ denote elementwise multiplication and division, and the exponentiations are also elementwise.

Then recall that the self-gradient of a matrix $A$ with respect to on of its components $\A$ is the single-entry matrix $\E$ $$\eqalign{ \grad A = \E \:=\: \e\,e_j^T \\ }$$ so replacing the differential $dA$ by the single-entry matrix produces the expressions on the far RHS.

Rewrite the function using the above matrix variables and calculate its gradient wrt $\A$ $$\eqalign{ f &= As - g^\d - C\o \\ df &= dA\,s - \d g^{\d-\o}\odot dg - dC\,\o \\ \grad f &= \E\,s - \d g^{\d-\o}\odot\e - (M\odot\E)\,\o \\ &= \left(s_j - \d g^{\d-\o}_i - \M\right) \e \\ \grad{f_k} &= \left(s_j -\d g^{\d-\o}_i -\M\right) \d_{ik} \\ }$$ This agrees with the gradient that you calculated (except the $\d$ symbol is being overused).

Setting the gradient to zero means that the scalar term in parenthesis must be equal to zero $$\eqalign{ s_j &= \M \;+\; \d g^{\d-\o}_i \q\q &\{ {\sf Index\ notation} \} \\ \o s^T &= M \;+\; \d g^{\d-\o}\o^T \q\q &\{ {\sf Matrix\ equation} \} \\ }$$ Solving this for the optimal $A$ matrix will be extremely difficult.

Are you sure that the objective function isn't scalar-valued?
Something like $$ \tfrac12\big\|f\big\|^2 \q{\sf or}\q w^Tf $$

$\endgroup$
2
  • $\begingroup$ Thank you for your time and for the quality of your answer. Can you provide some tips or reading suggestions for the last step "Solving this for the optimal 𝐴 matrix"? I'm totally sure that the objective is not scalar-valued (it is related to a graph problem, and alpha needs to be a matrix) $\endgroup$
    – Goug
    Commented Jun 10 at 9:01
  • $\begingroup$ But you are trying to "maximize" a vector function, which involves trade-offs. Let's say maximizing $\cal O_1$ causes $\cal O_2$ to become tiny, or vice versa. You will need to assign weights to the relative importance of each component of $\cal O.\;$ But at that point, you might as well gather all of those weighted components into a single scalar-valued cost function and optimize with respect to it. $\endgroup$
    – greg
    Commented Jun 10 at 14:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .