The Chi-Squared test can also be done with the higher values for blue shoes.
Method 1
Your observations in a contingency table are:
$$\begin{array}{ c c c c c c c c c | c}
\\
410 & 500 & 432 & 518 & 363 & 428 & 409 & 696 & 500& 4256\\
22 & 77 & 104 & 102 & 122 & 112 & 135 & 104 & 96& 874 \\
23 & 64 & 74 & 79 & 63 & 76 & 109 & 111 & 90& 689\\
\hline
455& 641 & 610 & 699 & 548 & 616 & 653 & 911 & 686&5819
\end{array}$$
Estimated (rounded for simpler presentation) based on the frequencies/numbers in the columns and rows totals:
$$\begin{array}{ c c c c c c c c c | c}
\\
333 & 469 & 446 & 511 & 401 & 451 & 478 & 666 & 502& 4256\\
68 & 96 & 92 & 105 & 82 & 93 & 98 & 137 & 103& 874 \\
54 & 76 & 72 & 83 & 65 & 73 & 77 & 108 & 81& 689\\
\hline
455& 641 & 610 & 699 & 548 & 616 & 653 & 911 & 686&5819
\end{array}$$
$$\chi = \sum \frac{(estimated-observed)^2}{estimated} = 152.92$$
You have got 9x3 cells, and 9+3+1 variables used for estimating, the difference gives you 16 degrees of freedom and the p-value $P\left(\chi(16)\geq 152.92\right)<10^{-16}$ is smaller than my computer can compute (using R).
Method 2
Now the estimations are based on a linear model with different offsets for the shoes but the same slope
$$y \sim a_{shoetype} + b*(year-2009)$$
with
$$\begin{array}\\
a_{blue} &=& 434.29 \\
a_{red} &=& 58.51 \\
a_{green} &=& 37.96 \\
b &=& 9.65
\end{array}$$
which gives estimates:
$$\begin{array}{ c c c c c c c c c | c}
\\
434 & 444 & 454 & 463 & 473 & 483 & 492 & 502 & 511& 4256\\
59 & 68 & 78 & 87 & 97 & 107 & 116 & 126 & 136& 874 \\
38 & 48 & 57 & 67 & 77 & 86 & 96 & 106 & 115& 689\\
\hline
455& 641 & 610 & 699 & 548 & 616 & 653 & 911 & 686&5819
\end{array}$$
and the statistic is now
$$\chi = \sum \frac{(estimated-observed)^2}{estimated} = 227.14$$
which is a higher residual/chi but this time you used less parameters to estimate the values, yet you still have $P\left(\chi(22)\geq 227.14\right)<10^{-16}$
Interpretation
In both cases you "should" reject the model as a "good" fit. This means if the model were true, than it is unlikely to get these results. Or in other words the model is likely not the 'right' or 'true' model.
I have been using double quotes in the previous paragraph because I don't believe this concept of good fit is practical to you. You may wonder whether this chi-square test is a good reason to discard the slopes for your purpose of using the linear model (which does not capture noise or differentiation from the trend) to compare trends.
Note that the chi-square test tells you, based on the size of the residuals, whether your observations are likely given your model. If the results are unlikely then this can be due to different things. It does not need to mean that the slopes of the different shoes are wrong. It can be that the application of the linear model is wrong, or that the noise is to large (chi-square-test assumes a statistical effect of noise equal to roughly the square root of the counts, but that is different from the random behavior of customers)
method 1 vs 2
I should also note a difference between the first and second method. The 2nd method explicitly uses a least squares fit to estimate the shoe values. The 1st method is not using a slope and tests a different hypothesis. What it tests is whether the different shoes will always appear in the same ratio (not necessarily according to a linear model).
I have included this first test because it is what many people learn as the most typical example of a chi-square test, although you can use the test more generally such as with the model in the second method.
In both tests it is important that for the use of the $(e-o)^2/e$ function the numbers should be counts. This is important because that $(e-o)^2/e$ function stems from the fact that the estimate number comes from a binomial or Poisson distribution (counts), which has a specific deviation.
Method 3
More useful that goodness of fit (since your fit is not good), if you only want to compare the slopes, would be a linear model with estimates of confidence intervals or estimated error of the parameters.
$$y \sim a_{shoetype} + b_{shoetype}*(year-2009)$$
$$\begin{array}\\
a_{blue} &=& 418.76 \\
a_{red} &=& 67.178 \\
a_{green} &=& 44.822 \\
b_{blue} &=& 13.53 \pm se \, 12.54 \\
b_{red} &=& 7.483 \pm se \, 3.479 \\
b_{green} &=& 7.933 \pm se \, 2.108
\end{array}$$
Which shows that the slope of the blue shoes is higher. Although the error of these estimates is large and the difference is not significant (it is mostly the 2016 data which makes the blue shoes "increase" as a function of years). From that point of view the slopes are the same.
Quick overview of obtaining p-values
Below is a very quick flight trough several options (using R), how one might think this data could be fitted. The last one general linear model, is actually the only correct one (in my opinion).
I do not place lots of explanation and hope you can follow it based on the comments. Note that questions on these models are actually all questions by themselves (and they have been asked before on this website, so you can look it up).
Data
> # data
> y1 <- c(410,500,432,518,363,428,409,696,500);
> y2 <- c( 22, 77,104,102,122,112,135,104, 96);
> y3 <- c( 23, 64, 74, 79, 63, 76,109,111, 90);
> x <- c(2009:2017)-2009; # I substract 2009 from the years to bring the intercept with the y-axis closer to the data
>
Three independent models
> # three independent models
> mod <- lm(cbind(y1,y2,y3)~1+x);
> summary(mod);
Response y1 :
Call:
lm(formula = y1 ~ 1 + x)
Residuals:
Min 1Q Median 3Q Max
-109.89 -58.42 -13.82 58.64 182.51
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 418.76 59.72 7.012 0.000209 ***
x 13.53 12.54 1.079 0.316433
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 97.17 on 7 degrees of freedom
Multiple R-squared: 0.1426, Adjusted R-squared: 0.02007
F-statistic: 1.164 on 1 and 7 DF, p-value: 0.3164
Response y2 :
Call:
lm(formula = y2 ~ 1 + x)
Residuals:
Min 1Q Median 3Q Max
-45.178 -15.561 7.406 21.856 24.889
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 67.178 16.562 4.056 0.00483 **
x 7.483 3.479 2.151 0.06850 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 26.95 on 7 degrees of freedom
Multiple R-squared: 0.398, Adjusted R-squared: 0.312
F-statistic: 4.627 on 1 and 7 DF, p-value: 0.0685
Response y3 :
Call:
lm(formula = y3 ~ 1 + x)
Residuals:
Min 1Q Median 3Q Max
-21.82 -13.56 10.38 11.24 16.58
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.822 10.035 4.467 0.00291 **
x 7.933 2.108 3.764 0.00704 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.33 on 7 degrees of freedom
Multiple R-squared: 0.6693, Adjusted R-squared: 0.622
F-statistic: 14.17 on 1 and 7 DF, p-value: 0.00704
> anova(mod);
Analysis of Variance Table
Df Pillai approx F num Df den Df Pr(>F)
(Intercept) 1 0.98794 136.550 3 5 3.239e-05 ***
x 1 0.67745 3.501 3 5 0.1055
Residuals 7
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> confint(mod);
2.5 % 97.5 %
>
One single model
> # a bit more precision if we model the three together
> y <- c(y1,y2,y3)
> x <- c(x,x,x)
> s <- factor(gl(3,9))
>
>
adding different slopes (the cross term) does not yield an exceptional change of residuals p for this change or larger is 0.82
# adding different slopes (the cross term) does not yield an exceptional change of residuals p for this change or larger is 0.82
> anova(lm(y~0+s+x),lm(y~0+s+x+s:x))
Analysis of Variance Table
Model 1: y ~ 0 + s + x
Model 2: y ~ 0 + s + x + s:x
Res.Df RSS Df Sum of Sq F Pr(>F)
1 23 74406
2 21 73043 2 1363.3 0.196 0.8235
>
view estimated values
the awkward expression y~0+s+s:x, instead of y~s*x, is such that we get three values for the intercept and three values for the slope (these are easier to interpret), instead of a base intercept and slope, and difference of two shoes with this base
> summary(lm(y~0+s+s:x))
Call:
lm(formula = y ~ 0 + s + s:x)
Residuals:
Min 1Q Median 3Q Max
-109.889 -20.056 2.339 14.944 182.511
Coefficients:
Estimate Std. Error t value Pr(>|t|)
s1 418.756 36.249 11.552 1.46e-10 ***
s2 67.178 36.249 1.853 0.078 .
s3 44.822 36.249 1.237 0.230
s1:x 13.533 7.614 1.777 0.090 .
s2:x 7.483 7.614 0.983 0.337
s3:x 7.933 7.614 1.042 0.309
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 58.98 on 21 degrees of freedom
Multiple R-squared: 0.9674, Adjusted R-squared: 0.9581
F-statistic: 103.9 on 6 and 21 DF, p-value: 1.65e-14
> confint(lm(y~0+s+s:x))
2.5 % 97.5 %
s1 343.371497 494.13961
s2 -8.206281 142.56184
s3 -30.561837 120.20628
s1:x -2.300486 29.36715
s2:x -8.350486 23.31715
s3:x -7.900486 23.76715
>
Those negative values, for a variable which is in principle positive, is a bit strange. But, we should have used a Poisson distribution (distribution of counts in time period), instead of Gaussian distribution.
> # Those negative values are rediculous
> # But we should have realized before that the distribution is Poisson
> summary(glm(y~0+s+s:x,family = poisson(link=identity)))
Call:
glm(formula = y ~ 0 + s + s:x, family = poisson(link = identity))
Deviance Residuals:
Min 1Q Median 3Q Max
-5.527 -2.049 0.514 1.769 7.715
Coefficients:
Estimate Std. Error z value Pr(>|z|)
s1 420.994 12.908 32.614 < 2e-16 ***
s2 58.963 5.183 11.376 < 2e-16 ***
s3 41.688 4.441 9.387 < 2e-16 ***
s1:x 12.974 2.802 4.630 3.65e-06 ***
s2:x 9.537 1.239 7.699 1.37e-14 ***
s3:x 8.717 1.089 8.007 1.18e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: Inf on 27 degrees of freedom
Residual deviance: 227.24 on 21 degrees of freedom
AIC: 422.87
Number of Fisher Scoring iterations: 8
> confint(glm(y~0+s+s:x,family = poisson(link=identity)))
Waiting for profiling to be done...
2.5 % 97.5 %
s1 396.428337 446.26233
s2 47.866267 70.89148
s3 32.862051 51.34399
s1:x 7.591633 18.35493
s2:x 6.748939 12.31191
s3:x 6.438882 10.98363
>
For this glm model the effect of a single slopes is significant. But the effect of different slopes is not significant
> # now the effect of the slopes is significant
> # but the effect of 'different' slopes is not significant
> anova(glm(y~0+s+x,family = poisson(link=identity)),
+ glm(y~0+s+x+s:x,family = poisson(link=identity)),
+ test="Chisq")
Analysis of Deviance Table
Model 1: y ~ 0 + s + x
Model 2: y ~ 0 + s + x + s:x
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 23 229.29
2 21 227.24 2 2.0503 0.3587
>