The classical, well known delta method states the following: If $\sqrt{n}(X_{n}-\theta)\overset{law}{\longrightarrow}N(0,\sigma^{2})$. Then the following holds: $\sqrt{n}(g(X_{n})-g(\theta))\overset{law}{\longrightarrow}N(0,\sigma^{2}(g'(\theta))^{2})$ for any function $g$ satisfying the property that $g'(\theta)$ exists and is non-zero valued. The key step, proving this result, is the following expression: $g(X_{n})=g(\theta)+g'(\overline{\theta})(X_{n}-\theta)$ for some intermediate value $\overline{\theta}$ with $X_{n}<\overline{\theta}<\theta$. What exactly ensures the existence of such a $\overline{\theta}$? It should follow using Taylor's theorem, but I am not able to argue rigorously.
-
1$\begingroup$ Mean value theorem? $\endgroup$– ChappersCommented Sep 18, 2017 at 21:42
-
$\begingroup$ @Chappers So the argument, given in en.wikipedia.org/wiki/Delta_method#Proof_in_the_univariate_case is wrong? $\endgroup$– XarrusCommented Sep 18, 2017 at 21:55
-
$\begingroup$ Second sentence of that section starts "To begin, we use the mean value theorem [...]"? $\endgroup$– ChappersCommented Sep 18, 2017 at 21:59
2 Answers
All that is needed is the differentiability of $g$ at the single point $\theta$; the intermediate value theorem and $\overline\theta$ are not needed. Recall that "$g$ is differentiable at $\theta$" means that the limit $\lim_{h\to0} (g(\theta+h)-g(\theta))/h$ exists; we give its value the name $g'(\theta).$ Now define the function $r$ by setting $r(h) = (g(\theta+h)-g(\theta))/h - g'(\theta)$ if $h\neq 0$ and $r(0)=0$. What we know about $r$ is more-or-less only that $\lim_{h\to0} r(h)=0.$
But now we have a degree 1 "Taylor approximation": $$g(\theta+h) = g(\theta) + g'(\theta)h + hr(h)$$ just by unrolling the definition of $r$, where we know $r(h)\to0$ as $h\to0$. Apply this to $h=X_n-\theta$, and multiply by $\sqrt n$ to get $$ \sqrt n (g(X_n)-g(\theta)) = g'(\theta) \sqrt n(X_n-\theta) + \sqrt n (X_n-\theta) r( X_n-\theta).$$
Now Slutsky's theorem kicks in: $\sqrt n (X_n-\theta)$ is tight (or $O_P(1)$, if you will) and it multiplies $r(X_n-\theta)$, which converges in probability to $0$, so the product converges to 0 in probability.
The "secret sauce" is that the proof of a degree 1 Taylor approximation amounts to no more than a recitation of the definition of differentiability at the point of expansion. Unlike higher degree Taylor approximations, where stronger hypotheses and arguments are needed.
-
$\begingroup$ Is there anywhere in the delta method that says $X_n$ has to converge to $\theta$? (and what type of convergence is necessary?) $\endgroup$– darkgbmCommented Jun 25, 2023 at 16:33
-
$\begingroup$ @darkgbm The standard delta method hypothesis is that $\sqrt n (X_n-\theta)$ converges in distribution to a Gaussian. This already implies $X_n$ converges to $\theta$ in probability. Of course in applications one might know more. Such as, usually $X_n$ is a sample average of iid finite-variance summands. In such a case, $X_n\to\theta$ almost surely, and in $L^2$, as well. $\endgroup$ Commented Jun 25, 2023 at 23:00
-
$\begingroup$ To show that $X_n$ converges to $\theta$ in probability, would it suffice to define $Y_n = 1/n$, which converges to $0$, then multiply $Y_n$ on both sides of $\sqrt{n}(X_n - \theta)$ and the Gaussian and use Slutsky's Theorem to conclude that $X_n - \theta$ converges in distribution to $0$, and since convergence in distribution to a constant implies convergence to probability, we are done? Or is there another more straightforward way? $\endgroup$– darkgbmCommented Jun 27, 2023 at 2:14
Excellent point! Why is such a $\overline{\theta}$ even measurable? I congratulate you on your rigor. Here is a proof which avoids these difficulties:
Lemma: \begin{align*} &\text{i)}\hspace{10em} \sqrt{n}\big(g(X_n)-g(\theta)-g'(\theta)(X_n-\theta)\big) \overset{p}{\rightarrow} 0, \\ &\text{ii)}\hspace{10em} \sqrt{n}g'(\theta)(X_n-\theta) \overset{d}{\rightharpoonup} g'(\theta)N(0,\sigma^2)=N(0,g'(\theta)^2\sigma^2) \end{align*}
Before proving the lemma, notice it suffices to prove it, as Slutsky's Theorem implies that:
$$\sqrt{n}(g(X_n)-g(\theta))=\sqrt{n}(g(X_n)-g(\theta)-g'(\theta)(X_n-\theta))+$$ $$\sqrt{n}g'(\theta)(X_n-\theta)\overset{d}{\rightharpoonup} 0+N(0,g'(\theta)^2\sigma^2)$$
Now let us prove the Lemma.
Part $ii$ is immediate. One can prove it directly with the cumulative distribution or even quicker with Slutsky as $g'(\theta)\overset{p}{\rightharpoonup}g'(\theta)$ and by our hypothesis $\sqrt{n}(X_n-\theta)\overset{d}{\rightharpoonup} N(0,\sigma^2) $. (Yes, I have succumbed to the charms of Slutsky's Theorem!).
More interesting perhaps is part $(i)$. We know by definition of the Frechet derivative that $r(x-\theta)=g(x)-g(\theta)-(x-\theta)g'(\theta)$ goes to zero when $x$ approaches $\theta$ even upon division by $x-\theta$. Hence it is perhaps more convenient to write:
$$\sqrt{n}\big(g(X_n)-g(\theta)-g'(\theta)(X_n-\theta)\big) =\sqrt{n}(X_n-\theta)\frac{r(X_n-\theta)}{X_n-\theta}$$
Because $1/\sqrt{n}\overset{p}{\rightarrow} 0$ and $\sqrt{n}(X_n-\theta)\overset{d}{\rightharpoonup}N(0,\sigma^2)$, we have that by Slutsky's Theorem, $(X_n-\theta)\overset{d}{\rightharpoonup} 0$ and hence, $(X_n-\theta)\overset{p}{\rightarrow} 0$. ( If you have never proven that convergence in distribution to a constant is stronger than convergence in probability to the constant, prove it!)
Now, notice that this implies that for $n$ large enough dependent on $\varepsilon'$ and for $\delta(\varepsilon)$ such that $|r(x)|/|x|<\varepsilon$ for $|x|<\delta$:
$$\mathbb{P}\left(\frac{|r(X_n-\theta)|}{|X_n-\theta|}>\varepsilon\right)\leq \mathbb{P}\left(\frac{|r(X_n-\theta)|}{|X_n-\theta|}>\varepsilon\cap |X_n-\theta|<\delta\right)+$$ $$\mathbb{P}\left( |X_n-\theta|\geq \delta\right)\leq 0+\varepsilon' $$
Hence, $r(X_n-\theta)/(X_n-\theta)\overset{p}{\rightarrow}0$. Thus, by a final application of Slutsky's Theorem:
$$ \sqrt{n}(X_n-\theta)\frac{r(X_n-\theta)}{X_n-\theta}\overset{d}{\rightharpoonup} N(0,\sigma^2)0 =0$$
$$\therefore \sqrt{n}\big(g(X_n)-g(\theta)-g'(\theta)(X_n-\theta)\big)= \sqrt{n}(X_n-\theta)\frac{r(X_n-\theta)}{X_n-\theta}\overset{p}{\rightharpoonup} 0$$