Is there any difference in minimizing the sum of squared errors in a linear regression model learning, compared to minimizing the mean of the sum of squared errors, apart from having easier math when calculating the derivative of the error function?
The formula I am talking about is:
$$Error_{\theta} = \sum_{i=1}^{N} \left(\left(y_i - \left(\theta^T \cdot x_i \right) \right)^2 \right)$$
And usually the following is the minimization problem:
$$\underset{\theta}{\min} \left( \frac{1}{N} \sum_{i=1}^{N} \left(\left(y_i - \left(\theta^T \cdot x_i \right) \right)^2 \right) \right)$$
or add a $2$ to make derivation easier:
$$\underset{\theta}{\min} \left( \frac{1}{2N} \sum_{i=1}^{N} \left(\left(y_i - \left(\theta^T \cdot x_i \right) \right)^2 \right) \right)$$
Wouldn't
$$\underset{\theta}{\min} \left( \sum_{i=1}^{N} \left(\left(y_i - \left(\theta^T \cdot x_i \right) \right)^2 \right) \right)$$
work just as well?