Mean square in least squares problem

Question

While following some code to for a least squares problem using gradient descent, the claim was that the functional to be minimized is the "mean square error", $E=\frac{1}{n}\sum_{i=0}^n(y_i-\hat{y_i})^2$, where $y_i$ are the data points, and $\hat{y_i}$ are the outputs in the linear model for data points $x_i$.

My question is, why the factor of $\frac{1}{n}$? It makes no difference in setting the partial derivatives equal to $0$, so is there a statistical reason for using it, if say, the error terms are distributed a certain way (for the least squares problem, we don't care how they're distributed), or is there a coding reason?

Hi: you could use the sum when minimizing it. The term "mean" square error is more common probably because, when it comes to statistical hypothesis testing, mean squared error ( rather than the sum ) takes on a specific role because it is used ( and the $n$ is really needed ) in various test-statistics. — mlofton, Commented Mar 1 at 12:39
One of many possible answers: although multiplying by $1/n$ does not change any solution, it makes the values of $E$ more comparable for differing $n.$ In many circumstances, the value of $E$ converges to a fixed value or fixed distribution as $n$ grows large, whereas $nE$ would always diverge in any sense. There's no conceivable coding reason, because no reasonable $n$ will be so large as to compensate for overflow in the sum or create underflow in $E$ and the division by $n$ scarcely affects the floating point precision. — whuber, Commented Mar 1 at 15:26

Scott Thompson · Accepted Answer · 2024-03-01 17:59:30Z

As you note, the factor $1/n$ makes no difference to the calculation. In fact it is not uncommon in some texts for this to be dropped, with "least squares" defined as the solution to minimizing the sum of squared errors rather than the MSE.

The way that least squares solutions are actually computed in practice depends on the code, but a common approach is to gather the explanatory variables into a matrix $X$ and the dependent variables into a corresponding vector $y$ and then compute the elements of the first order condition for minimizing the sum of squared errors. That first order condition can be written $X'X b = X'y$ where $b$ is the solution to the minimization problem. There are many ways to solve numerically for $b$. The best known solution is to invert the $X'X$ matrix and multiply both sides by this inverse, which gives the familiar equation $b=(X'X)^{-1}X'y$. In practice this is not necessarily the most efficient or numerically stable way to find the answer. Consult any advanced text on statistical computing for details. But to get back to your question, notice that the factor $1/n$ does not appear in these equations, and so, no, applying that is not needed for coding reasons.

GrGr11 · Accepted Answer · 2024-03-01 13:11:08Z

0

You are averaging the squared errors. The part inside epsilon adds all squared errors. The factor finds the mean.

Hence the name mean square error.

answered Mar 1 at 13:11

GrGr11

11 bronze badge

Add a comment |

Stack Exchange Network

Mean square in least squares problem

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
least-squares
or ask your own question.

Hot Network Questions

Mean square in least squares problem

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged least-squares or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
least-squares
or ask your own question.