0
$\begingroup$

I have my data from the official statistics office of my country and I rechecked multiple times already. I have a big outlier skewing all my glm (poisson) modells to the extreme (like 5 times the deviance). I tested deleting the row of data and i get the expected results with a good fit. Since I'm not allowed to do this without good reason (will call them tommorrow and check for possible error).

Is there a way to deal with this, if its indeed correct?

Looks kinda like this, sorry bad with formatting.


2 87 85 84 79 93 83 88 91 76


with 2 being the obvious outlier.

Appreciate it!

$\endgroup$

1 Answer 1

1
$\begingroup$

There are many ways to detect outliers. It is not clear for me if your outliers is in the dependent or independent variable. However, your problem is not about detecting the outlier, since you have already detected.

If the outlier is in the dependent variable, maybe one possible way to justify its remotion is to use the cook's distance Cook distance in wiki and to show that this point is very influencial and change the coefficients of the entire result.

However, I strongly believe that the justification for remotion should come with information that is external to the dataset. We could try to answer a question like the ones presented below to find your justification:

1) Is the value impossible? For instance, a distance or a population that is larger the entire city that your data belong.

2) Is the value collected in a special situation. For instance, checking the date you can see if for some reason a problem may have happened in that day.

3) Imagine that you are studying the efficiency of some cities in a state. Maybe there is a very large city with industrial structure, but all the others are small. So, these cities are very different. In fact, if you have more than one outlier with a similar behavior you may add a dummy to your regression to deal with them. For instance, in models of banking, large banks work very different from the small ones. So if you add a dummy, you do not remove the outliers, but treat them differently.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.