1
$\begingroup$

I have the following distribution of counts where blue represents blue shoes, red represents red shoes and green represents green shoes. Though the total sales numbers differ I want to check if the slopes of the three categories are the same over the years.:

enter image description here

I thought to compute a Chi-Squared Goodness of fit test for equal proportions but this seems to be not working since the values of the blue shoes are much higher than the other values.

Is there any possibility to check the slopoes using a Chi-Squared test (in Minitab/ Python)? If yes, how can I achieve the desired result? If not, what are my options?

Appreciate any help!!!

$\endgroup$
7
  • $\begingroup$ I will post an answer later. But I have a question in the meantime, just because I am curious about how these questions are born. In what type of context did you get this question? Is this for instance a question arising in business or is it a hypothetical question as part of study and homework? And maybe you can tell a bit more about it. $\endgroup$ Commented Oct 26, 2017 at 16:12
  • $\begingroup$ The specific problem is from business but the actual excample is hypothetical. In general I simply wan to check if one variable (blue, green, red) has a (statistically significant) steeper slope than the others. Ideally I want to chieve that with a Chi-Squared test but also appreciate any help how i can achieve that using other methods $\endgroup$
    – 2Obe
    Commented Oct 26, 2017 at 16:30
  • $\begingroup$ What does it mean to you, 'steeper slope', in this business example? Why do you look at a slope, and when is this slope steeper (what do you believe is a meaningful boundary or concept of a boundary)? To explain where I am heading towards, there are many ways to look at this data statistically, for instance is a faster increase (which is a different concept than a slope of a linear curve) also good for you? $\endgroup$ Commented Oct 26, 2017 at 17:01
  • 1
    $\begingroup$ Annualized sales data isn't terribly informative. Monthly or even quarterly data would be much more useful. Given that the data is integer, why not fit a poisson or, if poisson assumptions don't hold, negative binomial model that "stacks" up the data by time (year) and shoe type. With this functional form, the dependent variable would be the counts and the independent variables would be dummy coded variables for year and shoe type. By taking the natural log of the counts, the resulting slopes would become directly comparable. >> $\endgroup$
    – user78229
    Commented Oct 27, 2017 at 10:21
  • 1
    $\begingroup$ If you natural log transform the counts then OLS regression would work and the functional form would be a panel data or pooled time series model. In other words, poisson or negative binomial models would not be needed. Furthermore, by eliminating the intercept and treating the 3 shoe types as one factor with 3 levels, contrast tests on the resulting coefficients would give you the statistical results you seek. $\endgroup$
    – user78229
    Commented Oct 27, 2017 at 16:36

1 Answer 1

3
$\begingroup$

The Chi-Squared test can also be done with the higher values for blue shoes.


Method 1

Your observations in a contingency table are:

$$\begin{array}{ c c c c c c c c c | c} \\ 410 & 500 & 432 & 518 & 363 & 428 & 409 & 696 & 500& 4256\\ 22 & 77 & 104 & 102 & 122 & 112 & 135 & 104 & 96& 874 \\ 23 & 64 & 74 & 79 & 63 & 76 & 109 & 111 & 90& 689\\ \hline 455& 641 & 610 & 699 & 548 & 616 & 653 & 911 & 686&5819 \end{array}$$

Estimated (rounded for simpler presentation) based on the frequencies/numbers in the columns and rows totals:

$$\begin{array}{ c c c c c c c c c | c} \\ 333 & 469 & 446 & 511 & 401 & 451 & 478 & 666 & 502& 4256\\ 68 & 96 & 92 & 105 & 82 & 93 & 98 & 137 & 103& 874 \\ 54 & 76 & 72 & 83 & 65 & 73 & 77 & 108 & 81& 689\\ \hline 455& 641 & 610 & 699 & 548 & 616 & 653 & 911 & 686&5819 \end{array}$$

$$\chi = \sum \frac{(estimated-observed)^2}{estimated} = 152.92$$

You have got 9x3 cells, and 9+3+1 variables used for estimating, the difference gives you 16 degrees of freedom and the p-value $P\left(\chi(16)\geq 152.92\right)<10^{-16}$ is smaller than my computer can compute (using R).


Method 2

Now the estimations are based on a linear model with different offsets for the shoes but the same slope $$y \sim a_{shoetype} + b*(year-2009)$$ with $$\begin{array}\\ a_{blue} &=& 434.29 \\ a_{red} &=& 58.51 \\ a_{green} &=& 37.96 \\ b &=& 9.65 \end{array}$$

which gives estimates:

$$\begin{array}{ c c c c c c c c c | c} \\ 434 & 444 & 454 & 463 & 473 & 483 & 492 & 502 & 511& 4256\\ 59 & 68 & 78 & 87 & 97 & 107 & 116 & 126 & 136& 874 \\ 38 & 48 & 57 & 67 & 77 & 86 & 96 & 106 & 115& 689\\ \hline 455& 641 & 610 & 699 & 548 & 616 & 653 & 911 & 686&5819 \end{array}$$

and the statistic is now

$$\chi = \sum \frac{(estimated-observed)^2}{estimated} = 227.14$$

which is a higher residual/chi but this time you used less parameters to estimate the values, yet you still have $P\left(\chi(22)\geq 227.14\right)<10^{-16}$


Interpretation

In both cases you "should" reject the model as a "good" fit. This means if the model were true, than it is unlikely to get these results. Or in other words the model is likely not the 'right' or 'true' model.

I have been using double quotes in the previous paragraph because I don't believe this concept of good fit is practical to you. You may wonder whether this chi-square test is a good reason to discard the slopes for your purpose of using the linear model (which does not capture noise or differentiation from the trend) to compare trends.

Note that the chi-square test tells you, based on the size of the residuals, whether your observations are likely given your model. If the results are unlikely then this can be due to different things. It does not need to mean that the slopes of the different shoes are wrong. It can be that the application of the linear model is wrong, or that the noise is to large (chi-square-test assumes a statistical effect of noise equal to roughly the square root of the counts, but that is different from the random behavior of customers)

method 1 vs 2

I should also note a difference between the first and second method. The 2nd method explicitly uses a least squares fit to estimate the shoe values. The 1st method is not using a slope and tests a different hypothesis. What it tests is whether the different shoes will always appear in the same ratio (not necessarily according to a linear model).

I have included this first test because it is what many people learn as the most typical example of a chi-square test, although you can use the test more generally such as with the model in the second method.

In both tests it is important that for the use of the $(e-o)^2/e$ function the numbers should be counts. This is important because that $(e-o)^2/e$ function stems from the fact that the estimate number comes from a binomial or Poisson distribution (counts), which has a specific deviation.


Method 3

More useful that goodness of fit (since your fit is not good), if you only want to compare the slopes, would be a linear model with estimates of confidence intervals or estimated error of the parameters.

$$y \sim a_{shoetype} + b_{shoetype}*(year-2009)$$

$$\begin{array}\\ a_{blue} &=& 418.76 \\ a_{red} &=& 67.178 \\ a_{green} &=& 44.822 \\ b_{blue} &=& 13.53 \pm se \, 12.54 \\ b_{red} &=& 7.483 \pm se \, 3.479 \\ b_{green} &=& 7.933 \pm se \, 2.108 \end{array}$$

Which shows that the slope of the blue shoes is higher. Although the error of these estimates is large and the difference is not significant (it is mostly the 2016 data which makes the blue shoes "increase" as a function of years). From that point of view the slopes are the same.


Quick overview of obtaining p-values

Below is a very quick flight trough several options (using R), how one might think this data could be fitted. The last one general linear model, is actually the only correct one (in my opinion).

I do not place lots of explanation and hope you can follow it based on the comments. Note that questions on these models are actually all questions by themselves (and they have been asked before on this website, so you can look it up).

Data

> # data 
> y1 <- c(410,500,432,518,363,428,409,696,500); 
> y2 <- c( 22, 77,104,102,122,112,135,104, 96);
> y3 <- c( 23, 64, 74, 79, 63, 76,109,111, 90);
> x <- c(2009:2017)-2009; # I substract 2009 from the years to bring the intercept with the y-axis closer to the data
> 

Three independent models

> # three independent models
> mod <- lm(cbind(y1,y2,y3)~1+x);
> summary(mod);
Response y1 :

  Call:
  lm(formula = y1 ~ 1 + x)

Residuals:
  Min      1Q  Median      3Q     Max 
-109.89  -58.42  -13.82   58.64  182.51 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
(Intercept)   418.76      59.72   7.012 0.000209 ***
  x              13.53      12.54   1.079 0.316433    
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 97.17 on 7 degrees of freedom
Multiple R-squared:  0.1426,    Adjusted R-squared:  0.02007 
F-statistic: 1.164 on 1 and 7 DF,  p-value: 0.3164


Response y2 :

  Call:
  lm(formula = y2 ~ 1 + x)

Residuals:
  Min      1Q  Median      3Q     Max 
-45.178 -15.561   7.406  21.856  24.889 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)   
(Intercept)   67.178     16.562   4.056  0.00483 **
  x              7.483      3.479   2.151  0.06850 . 
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 26.95 on 7 degrees of freedom
Multiple R-squared:  0.398, Adjusted R-squared:  0.312 
F-statistic: 4.627 on 1 and 7 DF,  p-value: 0.0685


Response y3 :

  Call:
  lm(formula = y3 ~ 1 + x)

Residuals:
  Min     1Q Median     3Q    Max 
-21.82 -13.56  10.38  11.24  16.58 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)   
(Intercept)   44.822     10.035   4.467  0.00291 **
  x              7.933      2.108   3.764  0.00704 **
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.33 on 7 degrees of freedom
Multiple R-squared:  0.6693,    Adjusted R-squared:  0.622 
F-statistic: 14.17 on 1 and 7 DF,  p-value: 0.00704


> anova(mod);
Analysis of Variance Table

Df  Pillai approx F num Df den Df    Pr(>F)    
(Intercept)  1 0.98794  136.550      3      5 3.239e-05 ***
  x            1 0.67745    3.501      3      5    0.1055    
Residuals    7                                             
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> confint(mod);
2.5 % 97.5 %
> 

One single model

> # a bit more precision if we model the three together
> y <- c(y1,y2,y3)
> x <- c(x,x,x)
> s <- factor(gl(3,9))
> 
> 

adding different slopes (the cross term) does not yield an exceptional change of residuals p for this change or larger is 0.82

# adding different slopes (the cross term) does not yield an exceptional change of residuals p for this change or larger is 0.82
> anova(lm(y~0+s+x),lm(y~0+s+x+s:x))
Analysis of Variance Table

Model 1: y ~ 0 + s + x
Model 2: y ~ 0 + s + x + s:x
Res.Df   RSS Df Sum of Sq     F Pr(>F)
1     23 74406                          
2     21 73043  2    1363.3 0.196 0.8235
> 

view estimated values

the awkward expression y~0+s+s:x, instead of y~s*x, is such that we get three values for the intercept and three values for the slope (these are easier to interpret), instead of a base intercept and slope, and difference of two shoes with this base

> summary(lm(y~0+s+s:x)) 

Call:
  lm(formula = y ~ 0 + s + s:x)

Residuals:
  Min       1Q   Median       3Q      Max 
-109.889  -20.056    2.339   14.944  182.511 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
s1    418.756     36.249  11.552 1.46e-10 ***
s2     67.178     36.249   1.853    0.078 .  
s3     44.822     36.249   1.237    0.230    
s1:x   13.533      7.614   1.777    0.090 .  
s2:x    7.483      7.614   0.983    0.337    
s3:x    7.933      7.614   1.042    0.309    
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 58.98 on 21 degrees of freedom
Multiple R-squared:  0.9674,    Adjusted R-squared:  0.9581 
F-statistic: 103.9 on 6 and 21 DF,  p-value: 1.65e-14

> confint(lm(y~0+s+s:x))
2.5 %    97.5 %
s1   343.371497 494.13961
s2    -8.206281 142.56184
s3   -30.561837 120.20628
s1:x  -2.300486  29.36715
s2:x  -8.350486  23.31715
s3:x  -7.900486  23.76715
> 

Those negative values, for a variable which is in principle positive, is a bit strange. But, we should have used a Poisson distribution (distribution of counts in time period), instead of Gaussian distribution.

> # Those negative values are rediculous
> # But we should have realized before that the distribution is Poisson
  > summary(glm(y~0+s+s:x,family = poisson(link=identity)))

Call:
  glm(formula = y ~ 0 + s + s:x, family = poisson(link = identity))

Deviance Residuals: 
  Min      1Q  Median      3Q     Max  
-5.527  -2.049   0.514   1.769   7.715  

Coefficients:
  Estimate Std. Error z value Pr(>|z|)    
s1    420.994     12.908  32.614  < 2e-16 ***
s2     58.963      5.183  11.376  < 2e-16 ***
s3     41.688      4.441   9.387  < 2e-16 ***
s1:x   12.974      2.802   4.630 3.65e-06 ***
s2:x    9.537      1.239   7.699 1.37e-14 ***
s3:x    8.717      1.089   8.007 1.18e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance:    Inf  on 27  degrees of freedom
Residual deviance: 227.24  on 21  degrees of freedom
AIC: 422.87

Number of Fisher Scoring iterations: 8

> confint(glm(y~0+s+s:x,family = poisson(link=identity)))
Waiting for profiling to be done...
2.5 %    97.5 %
s1   396.428337 446.26233
s2    47.866267  70.89148
s3    32.862051  51.34399
s1:x   7.591633  18.35493
s2:x   6.748939  12.31191
s3:x   6.438882  10.98363
> 

For this glm model the effect of a single slopes is significant. But the effect of different slopes is not significant

  > # now the effect of the slopes is significant
  > # but the effect of 'different' slopes is not significant
  > anova(glm(y~0+s+x,family = poisson(link=identity)),
          +       glm(y~0+s+x+s:x,family = poisson(link=identity)),
          +       test="Chisq")
Analysis of Deviance Table

Model 1: y ~ 0 + s + x
Model 2: y ~ 0 + s + x + s:x
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        23     229.29                     
2        21     227.24  2   2.0503   0.3587
> 
$\endgroup$
9
  • $\begingroup$ Thank you very much for your answer!!! I think that the chi-squared test is highly significant is obvious since the values of the blues shoes are much higher than the others. Thus I am simply interested in the slope of the functions. Facialitated I want to know (asseuming a linear model) If the number of blue shoes increases more from 2009 to 2017 as the red and green ones. Is there a statistical test (besides the Chi-Squared) which returns me a p value? $\endgroup$
    – 2Obe
    Commented Oct 27, 2017 at 6:18
  • 1
    $\begingroup$ The chi-square test is highly significant not because of the high values of blue shoes. The higher number of blue shoes is entirely dealt with by using a different offset, second method, or different row total, first method. This difference has no influence. ----- The reason why the test is highly significant, is because you do not have a nice linear function. The residuals should be about the order of the square root of the counts. You got occasionally large deviations from the linear model. Ie. in the first column observed red and green are half the estimated value. $\endgroup$ Commented Oct 27, 2017 at 8:25
  • 1
    $\begingroup$ The blue shoes have not significantly increased more than red and/or green shoes. Yes, the slope is bigger (change 2017-2009 in this sample is bigger for blue than red and green, this observation does not require a test), but in terms of an underlying model, the estimated slope parameter is very uncertain. You can see this with the naked eye. The blue shoes sales goes up and down like crazy, this random up-down motions is much more than the effect of the slope, which is only 13.5 shoes per year (and mostly attributed to the 2016 year which has almost 200 shoes more than any other year). $\endgroup$ Commented Oct 27, 2017 at 8:30
  • 1
    $\begingroup$ You could do a t-test or ANOVA. Or just look at the confidence interval. Then you will see that the blue shoes do not even have a significant growth. In R code y1 <- c(410,500,432,518,363,428,409,696,500); y2 <- c( 22, 77,104,102,122,112,135,104, 96); y3 <- c( 23, 64, 74, 79, 63, 76,109,111, 90); x <- c(2009:2017)-2009; mod <- lm(cbind(y1,y2,y3)~1+x); summary(mod); anova(mod); confint(mod); Those are three seperate models. You could also put the y1 y2 and y3 in a single column and use a slightly different model. $\endgroup$ Commented Oct 27, 2017 at 8:52
  • 1
    $\begingroup$ I have added a copy of different ways to get p-values. Starting with naive (wrong) models, and finally ending with a Poisson model which I believe is better. No large explanations of the methods are added. You can find these elsewhere on this website. $\endgroup$ Commented Oct 27, 2017 at 9:44

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.