Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
3 votes
1 answer
81 views

Statistical Error in Simple Linear Regression

I want to start off this question by saying that I'm looking for more of a conceptual understanding of this term in a simple regression model, not a mathematical one. In econometrics, simple linear ...
r_squared's user avatar
0 votes
1 answer
51 views

Fair comparison method for a biased physics-based model and its ML-correction version

I'm working with two prediction models: A calibrated physics-based model that consistently overestimates and has a fixed bias. An XGBoost model that predicts the error of the physics model to create ...
menzerml's user avatar
0 votes
1 answer
22 views

How to determine the confidence intervals for the principal axes of a second-rank tensor?

The question in short: How does one estimate the confidence intervals for the principal axes of a second-rank symmetric tensor when the measurement errors are themselves a function of the values of ...
Armadillo's user avatar
  • 363
0 votes
0 answers
28 views

Time series : Is SARIMA(p, 0, q)(P, 0, Q) a non-stationary model?

If the data is well explained without any differencing or seasonal differencing but requires some seasonal AR and MA terms, can we say that the data is stationary? I thought SARIMA was designed to ...
kingjerry's user avatar
2 votes
2 answers
64 views

How to measure the error between modeled and observed data?

Consider a scenario where observed data is represented in grey and modelled data in red, as below Here, the x-axis is a position, and the y-axis is an expected time, so that the slope defines, in a ...
sam wolfe's user avatar
  • 150
1 vote
1 answer
76 views

The error-rate in "The elements of statistical learning"

This picture is from the book "the elements of statistical learning": I am wondering how the test-error rate is calculated based on how the describe the simulation at the start? How do they ...
user394334's user avatar
1 vote
2 answers
87 views

In linear regression, does the formula for error contain the marginal expectation or conditional expectation?

In linear regression, let $\epsilon_i$ be the $i$th error term. Is the formula for $\epsilon_i$ $\epsilon_i = Y_i - E(Y_i)$ or $\epsilon_i = Y_i - E(Y_i | X_i = x_i)$? I have seen both definitions....
Iterator516's user avatar
3 votes
1 answer
162 views

How does non-collapsibility and the lack of an error term affect coefficients in regression

I have read from here that in nonlinear models such as the logit and Cox, because of a lack of an error term, coefficients may be biased (typically towards zero) when covariates are omitted; I see how ...
Geoff's user avatar
  • 771
1 vote
1 answer
92 views

Source of error in Linear Regression?

Suppose we are given n data points (observations) for random variable Y and variable X. We are to find regression equation of Y on X. As I’ve read these given values of Y (observations) are ...
Quorthon's user avatar
  • 107
5 votes
1 answer
325 views

OLS: do we test the residuals for normality *because* then the error terms can be assumed normal, too? Is there proof for this?

There are lots of resources out there that mix up residuals with errors, using the terms interchangeably, or saying "residual errors", or not acknowledging the existence of errors at all. (...
Reader 123's user avatar
1 vote
1 answer
110 views

Error of prediction from linear regression in R [closed]

I have an equation: $$ \large y = 0.243x + 0.145 $$ In the form: $$ \Large y = ax + b $$ I use it to predict $y$ when $x = 2$. To estimate the distribution around $\hat{y} = 0.631$ I need an estimate ...
Aaron Simmons's user avatar
5 votes
2 answers
481 views

Modelling the residuals of a model as a function of an external variable in order to assess its effect on the errors of the model?

I am working with field variables and telemetric variables. The dataset is composed of geographical locations for which i have both types of data. Amongst these field data, some are of interest, to be ...
Renaud Bied-charreton's user avatar
1 vote
3 answers
263 views

Residuals in linear regression - variance and independence

Let's assume a simple linear regression $$ y = \beta_0 + \beta_1 x + \varepsilon $$ where $\varepsilon_i$ are independent and come from $\mathcal{N}(0, \sigma^2)$. We define residuals as $$ e = y - \...
thesecond's user avatar
  • 380
2 votes
2 answers
169 views

How should I interpret the assumption of the regression?

I read an econometrics book which states one of the basic assumptions of regression is that $$E(u|x) = 0$$ In another book however I see it written that $$E(u_i|x_i) = 0$$ Are these two saying the ...
Stephen Johson's user avatar
1 vote
0 answers
44 views

If the error term in a regression is squared can it still be a linear regression?

So basically I’ve been taught that in a linear regression model the parameters alpha and beta cannot be squared when defining the equation of our model, does this also apply for the error term (...
Paolo Totaro's user avatar
1 vote
1 answer
28 views

What error metrics to use for time series model with time-step ahead hyper-parameter?

Background Denote the forecasted value by a Time Series ML Model as \begin{align} \hat{y}_{t+\tau} \, (t, \tau) \end{align} where $t$ is the current time-step, $\tau$ is the time-step ahead. The ...
K_inverse's user avatar
  • 185
1 vote
0 answers
44 views

Using multiple individual regressions Y~X_i to find the best two combination of predictors

Problem Statement Let's say I have multiple solved OLS regressions of the form $Y \sim X_1$, $Y \sim X_2$, ..., $Y \sim X_n$ I want to find which best two combined are the best predictors combined, ...
Joe's user avatar
  • 271
1 vote
1 answer
205 views

Is there an error metric that decreases the weight when the target is near zero?

As precipitation prediction models can only predict positive values, they won't be able to undershoot small values by much. When it comes to overshooting, there is no boundary. High precipitation ...
schefflaa's user avatar
2 votes
0 answers
61 views

How to compute sample variance and/or mean square error as a percentage?

Say I have a set of measurement values $y_\text{m} = (y_{\text{m},1}, \dots y_{\text{m},N}) $, and compare these with some ground truth $y = (y_1, \dots y_N)$. Then, if I understood correctly, I can ...
Sita's user avatar
  • 21
0 votes
0 answers
23 views

$\text{var}(\hat{u})\text { and }\text{var}(\hat{Y})$ derivation

Could anyone provide me with either the steps for derivation for these formulae or a textbook that covers these ols derivations? The textbook i’m currently using does not go over this. $$\text{var}(\...
stats123's user avatar
2 votes
1 answer
34 views

How to determine the precision of a measurement using a "perfect" reference method?

I developed an experimental method that measures certain material property. I want to see how precise my method is, so I have a confidence interval, or other metric. To do this, I'm comparing it with ...
user46147's user avatar
  • 135
1 vote
1 answer
37 views

To achieve consistency, why do we only make regularizer invariant?

The question comes from section 5.5.1 of "Pattern Recognition and Machine Learning" by Christopher M. Bishop. After giving linear transformation equations needed for network weights, the ...
zzzhhh's user avatar
  • 333
1 vote
1 answer
59 views

Calculating error metrics on log10(y) bayesian ridge regression model. Why does model perform better when trained on log10(y)?

I am using scikit learn's Bayesian ridge regression model and am training my model on log10(y), exponentiating (10 ** y_i) my predictions back to their original value, then calculating my error ...
lambdaChops's user avatar
0 votes
1 answer
194 views

Why is there a error term when OLS is decided into one equation per observation?

I have a hard time to understand why a error term is included when you split a regression equation into n equations (see below). Lets assume this simple series. Which refers to salary and house price ...
Simon Rydstedt's user avatar
0 votes
0 answers
840 views

How to estimate uncertainty (% error) of prediction from existing linear regression model on new inputs (out of sample data)?

Say I have an independent variable (x) and a dependent variable (y) for which I modeled the relationship using a simple linear regression (ols). I then use this model to predict on new x-inputs (out ...
Clouseau's user avatar
1 vote
0 answers
69 views

Subtracting predictions from linear regression model & stochastic error [closed]

Understanding that a linear regression model includes stochastic error, can I eliminate error from my model's predictions by subtracting one prediction from the other? Let's say we have a model of the ...
intransigent_rocker's user avatar
3 votes
2 answers
459 views

What if error of linear regression is uncorrelated between different observations, but dependent?

One of the assumptions of classic linear regression is that error is uncorrelated between different observations. But it is obvious that uncorrelated is not independent. I have already learn the ...
Jun's user avatar
  • 31
1 vote
1 answer
35 views

Best Error Function for Areas with Larger Slopes

I have some nonlinear data (let's say x's and y's) that I would like to perform regression on, and I would like to focus on having the error lower on regions where the graph is more sloped, rather ...
Tom Zhang's user avatar
3 votes
1 answer
1k views

What is the distribution of the error term in the Poisson Regression model? [duplicate]

Given a Poisson regression model as $y = E(y\mid x) + ε$ where $λ = E(y\mid x) = \exp(x'β)$ with $y$ from the Poisson distribution ($\operatorname{Poisson}(λ)$) I am trying to understand the ...
Dick's user avatar
  • 95
0 votes
0 answers
38 views

What is the 1 standard deviation error in papers

I have recently come across some papers in my field. They had a large (X,Y) data set, it was binned into 4 bins. Least square method was used to find slope (m) and y intercept (b). In the table that ...
Axxxxx's user avatar
  • 31
1 vote
0 answers
60 views

MAPE does not take into account the range of the output?

I have a time-series regression model where the output is always in the range of 6000-6050. After training my model, I get a Mean Absolute Error of around 18 and hence, very low Mean Absolute ...
dayyda's user avatar
  • 11
1 vote
0 answers
33 views

If the error terms in a regression setting are not observed, how can we ensure they're normally distributed?

According to the G-M assumptions, we should assume spherical errors. But my understanding is the errors -- as measured by the vertical distance from the true line of best fit to the response ...
Estimate the estimators's user avatar
2 votes
2 answers
73 views

Regression (in a wide sense), homoskedasticity and independence of error term

I suspect this might have a simple answer, but I have been stuck on it for a while. It is a simple true or false question. I suspect it to be false, but I haven't come up with a counterexample. At ...
Thomas Fjærvik's user avatar
5 votes
1 answer
725 views

Regularization and Shrinkage : Theoretical Advantages vs. Empirical Advantages [duplicate]

I have the following question about the theoretical advantages vs. the empirical advantages of regularization (i.e. shrinkage). As far as I understand, this is the general idea behind regularization: ...
stats_noob's user avatar
1 vote
1 answer
153 views

Why is the lower bound of the confidence interval of a model's error relatively constant compared to the upper bound? [closed]

I am interested in studying the effect of increasing data samples for a regression model on train error and test error. For this I have used 95% confidence intervals for different values of a sample ...
user481031's user avatar
6 votes
2 answers
956 views

Is the Cross Validation Error more "Informative" compared to AIC, BIC and the Likelihood Test?

Is the Cross Validation Error more "Informative" compared to AIC, BIC and the Likelihood Test? As far as I understand: The Likelihood Test is used to determine : Given some data, is some ...
stats_noob's user avatar
1 vote
1 answer
154 views

Worst case error variance

I need to create an upper and lower bound for the error variance, in linear regression or otherwise (state space models etc.). One way is to bootstrap confidence intervals, but that can be very ...
Dhruv Mahajan's user avatar
2 votes
1 answer
3k views

How to calculate Mean Squared Error when there are multiple observed y values for a single x value?

Given a data set where there exists multiple different observed y-values for a given x-value, how do I calculate Mean Squared Error? The formula implies that I subtract the predicted from a singular ...
feonyte's user avatar
  • 199
0 votes
0 answers
39 views

Neural network prediction gets worse when ground truth is close to the edges of output range

I am trying to use a simple neural network to predict numerical values of certain properties of sensor data (Somehow a regression problem). My network has only 1 output with Tanh activation, so output ...
StapleStable's user avatar
0 votes
0 answers
413 views

The difference between total error, prediction error and fitted error via residual

Consider a regression model $Y=E(Y|X)+Prediction \ Error$ i.e $Prediction \ error = Y-E(Y|X)$. Now, define an estimate of the regression function $E(Y|X)=\hat{Y}+ Fitted \ error$ i.e. Fitted error = $...
Lakshman's user avatar
4 votes
3 answers
1k views

Notice that the $\hat{f}$ don't have an error term, $\epsilon$, as would be expected for regression models. Why don't these models have an error term?

I am currently reading some notes on linear and polynomial regression. The notes say the following: The linear model is $$\hat{f}_L (X) = \beta_0 + \beta_1 X$$ The quadratic model is $$\hat{f}_Q(X) = ...
The Pointer's user avatar
  • 2,204
0 votes
1 answer
93 views

How to use errors to improve a regression?

I want to use a model to predict the errors of my model in order to improve it but I am confused in how to do that. I am seeking some help to understand how to procede. I have 100 data points and I ...
Rods2292's user avatar
  • 371
1 vote
0 answers
105 views

When using longitudinal variables on different time scales in a regression, is it valid backwards fill the dependent variable?

I'm working on a longitudinal project that is assessing an outcome variable through a monthly questionnaire and using daily activity as a predictor. The questionnaire asks about symptoms in the past ...
kentkr's user avatar
  • 11
0 votes
1 answer
141 views

How to extract error per data point using regression instead of just total sum of squares error

I am running a regression and found the sum of squares error for my data points. I was wondering if there was a way to calculate the error per data point instead of just the total error. I want to ...
Alex's user avatar
  • 1
3 votes
1 answer
2k views

Predicting a realistic distribution using regression

I'm trying to predict a distribution of a continuous variable, that looks like the real distributions I see in my training data. As an example, say I'm trying to predict people's wages, and I know ...
rw2's user avatar
  • 1,118
0 votes
0 answers
157 views

Methods to reduce regression underestimate and overestimate

I'm new to a project and need to reduce the underestimate & overestimate cases in a regression problem. So far haven't gained enough domain knowledge. Underestimate is less tolerable than ...
Cherry Wu's user avatar
  • 331
1 vote
1 answer
1k views

Linear regression on data with associated errors in the x and y direction [closed]

I have experimental data that I need to linearise in the form $\ln(b/x)$ vs. $x$, where $x$ is an experimentally-derived quantity and the value of $x$ is dependent on $b$ (which varies). $x$ has some ...
Ashley's user avatar
  • 11
3 votes
1 answer
263 views

Estimating the errors in parameters in the ordinary least square

I am reading the book An Introduction to Error Analysis by John R. Taylor. In Ch8: Least-Squares Fitting, he has derived expressions for parameters $A$ and $B$ in fitting the line $A+Bx$ to the set of ...
Peaceful's user avatar
  • 623
0 votes
0 answers
19 views

Estimate the support (~ confidence interval of the output) for a linear regression

Consider a linear regression model like this: I want to draw two margins, say upper- and lower-bounds, which contains 95% of the data. Formally, given a regression model ($\hat{y} = Ax+B$), I want to ...
Ali's user avatar
  • 318
3 votes
1 answer
360 views

How to set up a DGP for Monte Carlo simulation with non-independent regressions (correlated errors)

I want to set up a data generating process for two different estimations. The idea is to show how bias is introduced when the models are not properly specified. The first model should be a logit/...
Thomas's user avatar
  • 153

1
2 3 4 5