What is the intuition behind chain rule in mathematics in particular why there is a multiplication in between?
-
$\begingroup$ math.stackexchange.com/questions/62614/chain-rule-intuition $\endgroup$– Mark McClureCommented Mar 25, 2014 at 9:51
-
3$\begingroup$ I think the answers here are better than on the question whose duplicate this is closed as. Sigh. Can't we merge the questions? $\endgroup$– ShreevatsaRCommented Mar 26, 2014 at 1:54
5 Answers
The best way to think about the derivative is: if $f$ is differentiable at $x$, then \begin{equation*} f(x + \Delta x) \approx f(x) + f'(x) \Delta x. \end{equation*} The approximation is good when $\Delta x$ is small. This is practically the definition of $f'(x)$.
Now suppose $f(x) = g(h(x))$, and $h$ is differentiable at $x$, and $g$ is differentiable at $h(x)$. Then \begin{align*} f(x + \Delta x) & = g(h(x+\Delta x)) \\ &\approx g(h(x) + h'(x) \Delta x) \\ &\approx g(h(x)) + g'(h(x)) h'(x) \Delta x. \end{align*} Comparing this with the equation above suggests that \begin{align*} f'(x) = g'(h(x)) h'(x). \end{align*}
Many other rules about derivatives can be derived easily in this way.
-
8$\begingroup$ You can even make that rigorous by writing $f(x + \Delta x) = f(x) + f'(x) \Delta x + o(\Delta x)$ $\endgroup$ Commented Mar 25, 2014 at 10:26
-
3$\begingroup$ +1 to @nik's comment: we can define the derivative $f'(x)$ as the unique number $c$ satisfying $f(x + \delta) = f(x) + c\delta + o(\delta)$ (for $\delta \to 0$). Then for $f = g\circ h$ we have $$f(x+\delta) = g(h(x+\delta)) = g(h(x) + h'(x)\delta + o(\delta)) = g(h(x)) + g'(h(x)) (h'(x)\delta + o(\delta)) + o(\delta) = f(x) + g'(h(x))h'(x)\delta + o(\delta),$$ and thus this is a proof of the chain rule. As an aside, see Knuth's letter "Calculus via O notation" (PDF) for ideas of teaching calculus along these lines. :-) $\endgroup$ Commented Mar 25, 2014 at 11:55
-
1$\begingroup$ df/dx = df/dh * dh/dx , so called physisist expansion ;-) $\endgroup$ Commented Mar 25, 2014 at 15:25
-
1$\begingroup$ I found the top comment funny looking at the poster's username. $\endgroup$– user370967Commented Jan 5, 2018 at 23:17
For a function $g(x)$, imagine walking at constant (unit) speed along one number line, and seeing a red dot mark the function value of your current position on another number line. That is, imagine your position to be $x$, and the red dot to appear at $g(x)$. $g'(x)$ would be the speed of the red dot. Now, assume we chain this red dot to trigger a blue dot on a third number line, representing $f(x)$, i.e. if you yourself were to walk at unit speed along the $g$ line, then the blue dot on the $f$ line would light up at $f(x)$ and move with the speed $f'(x)$.
As you move along your original number line, the red dot appears at $g(x)$, so the blue dot appears at $f(g(x))$. This makes the blue dot move with speed $[f(g(x))]'$
The red dot on the $g$ line moves with speed $g'(x)$. The red and blue dots' movement speeds are proportional with proportionality factor $f'(g(x))$. Thus the resulting movement speed of the blue dot must be $f'(g(x))\cdot g'(x)$.
-
$\begingroup$ Very nice visual intuition! $\endgroup$ Commented Oct 17, 2015 at 18:46
-
$\begingroup$ it would be nicer if you could draw a graph do demonstrate the concept, but still, thank you very much for the explanation. it was helpful to my understanding $\endgroup$– ThorCommented Jul 15, 2018 at 7:56
-
1$\begingroup$ @Thor Since I made this answer, 3blue1brown has made a video showing this visualisation better than I ever could: youtu.be/CfW845LNObM It may not go into chain rule, but it does do derivatives, and with that as a help, I think visualising the chain rule the way I describe here isn't too difficult. $\endgroup$– ArthurCommented Jul 15, 2018 at 8:33
-
$\begingroup$ @Arthur that video is amazing! i would never encounter it if not because of your recommendation. thanks again for your help $\endgroup$– ThorCommented Jul 15, 2018 at 13:13
True even in several variables. Differentiable is locally linear-like. Composition of functions is locally $\approx$ composition of linear approximations. Composition of linear functions is matrix product.
In terms of differentials, we know that if the variables $x,y,z$ are related by $y = f(x)$ and $z = g(y)$, then
- $dy = f'(x) dx$
- $dz = g'(y) dy$
If differentials are even the slightest bit reasonable to think about, then we should be able to substitute, and get
- $dz = g'(y) f'(x) dx$
or, if you prefer,
- $dz = g'(f(x)) f'(x) dx$
Let $h(x)=f(g(x))$
$$\begin{align}h'(x) &= \lim_{t\to0}\frac{h(x+t)-h(t)}{t}\\&=\lim_{t\to0}\frac{f(g(x+t))-f(g(x))}{t}\\\end{align}$$
Now there are two possible cases,
$g(x+t)=g(x)$
In this case, $h(x+t)=h(x)$, and $h'(x)=0$, and $h'(x)=f'(g(x))\cdot g'(x)=0$ is satisfied.
$g(x+t)\to g(x)$
In this case we can write the limit as,
$$\lim_{t\to0}\frac{f(g(x+t))-f(g(x))}{t} = \frac{f(g(x+t))-f(g(x))}{g(x+t)-g(x)}\cdot \frac{g(x+t)-g(x)}t = f'(g(x))\cdot g'(x)$$
We do not consider the case where $\lim_{t\to0} g(x+t) \not \to g(x)$ since continutity and differentiability are requisite conditions here.