Big outlier in dependent variable

Question

I have my data from the official statistics office of my country and I rechecked multiple times already. I have a big outlier skewing all my glm (poisson) modells to the extreme (like 5 times the deviance). I tested deleting the row of data and i get the expected results with a good fit. Since I'm not allowed to do this without good reason (will call them tommorrow and check for possible error).

Is there a way to deal with this, if its indeed correct?

Looks kinda like this, sorry bad with formatting.

2 87 85 84 79 93 83 88 91 76

with 2 being the obvious outlier.

Appreciate it!

DanielTheRocketMan · Accepted Answer · 2020-02-14 02:57:51Z

There are many ways to detect outliers. It is not clear for me if your outliers is in the dependent or independent variable. However, your problem is not about detecting the outlier, since you have already detected.

If the outlier is in the dependent variable, maybe one possible way to justify its remotion is to use the cook's distance Cook distance in wiki and to show that this point is very influencial and change the coefficients of the entire result.

However, I strongly believe that the justification for remotion should come with information that is external to the dataset. We could try to answer a question like the ones presented below to find your justification:

1) Is the value impossible? For instance, a distance or a population that is larger the entire city that your data belong.

2) Is the value collected in a special situation. For instance, checking the date you can see if for some reason a problem may have happened in that day.

3) Imagine that you are studying the efficiency of some cities in a state. Maybe there is a very large city with industrial structure, but all the others are small. So, these cities are very different. In fact, if you have more than one outlier with a similar behavior you may add a dummy to your regression to deal with them. For instance, in models of banking, large banks work very different from the small ones. So if you add a dummy, you do not remove the outliers, but treat them differently.

Stack Exchange Network

Big outlier in dependent variable

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
generalized-linear-model
dataset
outliers
poisson-regression
errors-in-variables
or ask your own question.

Hot Network Questions

Big outlier in dependent variable

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged generalized-linear-modeldatasetoutlierspoisson-regressionerrors-in-variables or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
generalized-linear-model
dataset
outliers
poisson-regression
errors-in-variables
or ask your own question.