49
$\begingroup$

I am trying to understand the quantile regression, but one thing that makes me suffer is the choice of the loss function.

$\rho_\tau(u) = u(\tau-1_{\{u<0\}})$

I know that the minimum of the expectation of $\rho_\tau(y-u)$ is equal to the $\tau\%$-quantile, but what is the intuitive reason to start off with this function? I don't see the relation between minimizing this function and the quantile. Can somebody explain it to me?

$\endgroup$

2 Answers 2

48
$\begingroup$

I understand this question as asking for insight into how one could come up with any loss function that produces a given quantile as a loss minimizer no matter what the underlying distribution might be. It would be unsatisfactory, then, just to repeat the analysis in Wikipedia or elsewhere that shows this particular loss function works.

Let's begin with something familiar and simple.

What you're talking about is finding a "location" $x^{*}$ relative to a distribution or set of data $F$. It is well known, for instance, that the mean $\bar x$ minimizes the expected squared residual; that is, it is a value for which

$$\mathcal{L}_F(\bar x)=\int_{\mathbb{R}} (x - \bar x)^2 dF(x)$$

is as small as possible. I have used this notation to remind us that $\mathcal{L}$ is derived from a loss, that it is determined by $F$, but most importantly it depends on the number $\bar x$.

The standard way to show that $x^{*}$ minimizes any function begins by demonstrating the function's value does not decrease when $x^{*}$ is changed by a little bit. Such a value is called a critical point of the function.

What kind of loss function $\Lambda$ would result in a percentile $F^{-1}(\alpha)$ being a critical point? The loss for that value would be

$$\mathcal{L}_F(F^{-1}(\alpha)) = \int_{\mathbb{R}} \Lambda(x-F^{-1}(\alpha))dF(x)=\int_0^1\Lambda\left(F^{-1}(u)-F^{-1}(\alpha)\right)du.$$

For this to be a critical point, its derivative must be zero. Since we're just trying to find some solution, we won't pause to see whether the manipulations are legitimate: we'll plan to check technical details (such as whether we really can differentiate $\Lambda$, etc.) at the end. Thus

$$\eqalign{0 &=\mathcal{L}_F^\prime(x^{*})= \mathcal{L}_F^\prime(F^{-1}(\alpha))= -\int_0^1 \Lambda^\prime\left(F^{-1}(u)-F^{-1}(\alpha)\right)du \\ &= -\int_0^{\alpha} \Lambda^\prime\left(F^{-1}(u)-F^{-1}(\alpha)\right)du -\int_{\alpha}^1 \Lambda^\prime\left(F^{-1}(u)-F^{-1}(\alpha)\right)du.\tag{1} }$$

On the left hand side, the argument of $\Lambda$ is negative, whereas on the right hand side it is positive. Other than that, we have little control over the values of these integrals because $F$ could be any distribution function. Consequently our only hope is to make $\Lambda^\prime$ depend only on the sign of its argument, and otherwise it must be constant.

This implies $\Lambda$ will be piecewise linear, potentially with different slopes to the left and right of zero. Clearly it should be decreasing as zero is approached--it is, after all, a loss and not a gain. Moreover, rescaling $\Lambda$ by a constant will not change its properties, so we may feel free to set the left hand slope to $-1$. Let $\tau \gt 0$ be the right hand slope. Then $(1)$ simplifies to

$$0 = \alpha - \tau (1 - \alpha),$$

whence the unique solution is, up to a positive multiple,

$$\Lambda(x) = \cases{-x, \ x \le 0 \\ \frac{\alpha}{1-\alpha}x, \ x \ge 0.}$$

Multiplying this (natural) solution by $1-\alpha$, to clear the denominator, produces the loss function presented in the question.

Clearly all our manipulations are mathematically legitimate when $\Lambda$ has this form.

$\endgroup$
12
  • 1
    $\begingroup$ I haven't seen this treatment of Quantile Regression anywhere before, it's quite clear. Do you have any references for this approach? $\endgroup$
    – layn
    Commented Jun 10, 2020 at 13:05
  • 3
    $\begingroup$ @layn I can't put my hand on any references right now, but I'm sure any good discussion of modern nonparametric procedures will go well beyond this basic discussion. One useful keyword for searching might be "M-estimator." $\endgroup$
    – whuber
    Commented Jun 10, 2020 at 13:09
  • $\begingroup$ Follow up question: why is a piecewise quadratic loss (i.e. (y(tau-Indicator))^2) not a valid solution? $\endgroup$ Commented Jun 23, 2020 at 3:22
  • $\begingroup$ I guess what I'm asking is how did you reach the conclusion that "Consequently our only hope is to make Λ′ depend only on the sign of its argument, and otherwise it must be constant." $\endgroup$ Commented Jun 23, 2020 at 3:23
  • $\begingroup$ @Rylan Because $F$ can be any distribution function, the argument $F^{-1}(u)-F^{-1}(\alpha)$ can be practically any function. Equation $(1)$ has to hold for all such functions. If, in either of the integrals on the right hand side of $(1),$ $\Lambda^\prime$ were to vary, then given an $F$ that satisfies the equation we could modify it slightly to construct another $F$ for which the equation fails. $\endgroup$
    – whuber
    Commented Jun 23, 2020 at 11:55
34
$\begingroup$

The way this loss function is expressed is nice and compact but I think it's easier to understand by rewriting it as $$\rho_\tau(X-m) = (X-m)(\tau-1_{(X-m<0)}) = \begin{cases} \tau |X-m| & if \; X-m \ge 0 \\ (1 - \tau) |X-m| & if \; X-m < 0) \end{cases}$$

If you want to get an intuitive sense of why minimizing this loss function yields the $\tau$th quantile, it's helpful to consider a simple example. Let $X$ be a uniform random variable between 0 and 1. Let's also choose a concrete value for $\tau$, say, $0.25$.

So now the question is why would this loss function be minimized at $m=0.25$? Obviously, there's three times as much mass in the uniform distribution to the right of $m$ than there is to the left. And the loss function weights the values larger than this number at only a third of the weight given to values less than it. Thus, it's sort of intuitive that the scales are balanced when the $\tau$th quantile is used as the inflection point for the loss function.

$\endgroup$
2
  • 1
    $\begingroup$ Shouldn't it be the other way? Under-guessing will cost three times as much? $\endgroup$
    – Edi Bice
    Commented Apr 1, 2019 at 15:01
  • $\begingroup$ Thanks for catching that. The formula is right but I initially worded it incorrectly in my explanation. $\endgroup$
    – jjet
    Commented Aug 17, 2019 at 21:03

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.