4
$\begingroup$

I am trying to perform a GLM analaysis using R for an outcome that is:

  1. Bounded by 0 - 10
  2. In steps of 1

(Numerical Rating Scale for Pain: 0 - 10)

I have a set of demographic factors, age, sex etc, that I want to input as factors for the GLM.

I understand that Gaussian might not be the best option (since bounded by 0) but am not sure if I should choose Gamma (since this is not continuous) or Poisson (since the outcome is not counts)

the data is very much skewed:

enter image description here

thanks

s

$\endgroup$

2 Answers 2

10
$\begingroup$

This answer elaborates on some discussion in comments on the answer from Nick Cox.

Your situation might be handled by a multi-category extension of binomial regression: ordinal regression. You model the probability of moving from one category to the next in a way that takes advantage of the ordering among the outcome categories.

This UCLA web page illustrates ordinal logistic regression, based on a "proportional odds" (PO) assumption for moving up the scale. I don't know whether that assumption will hold for your data, but the page does show how to evaluate it.

Also, as Frank Harrell points out in Section 13.3.3 of his Regression Modeling Strategies book, a PO model can sometimes work well even if the assumption isn't met. In this answer to a question on highly skewed data that take only a few values with clumping at one end, he says:

When the dependent variable Y has a beautiful distribution I still recommend it be modeled using a Y-transformation-invariant semiparametric ordinal regression model such as the proportional odds model. With your Y, the need for a semiparametric model is even greater. Semiparametric models handle arbitrary clumping of Y values, bimodality, floor effects, ceiling effects, and outliers. Such models are also very efficient.

The orm() function in Harrell's rms package allows for ordinal regression with link functions other than the logit, and Section 13.4 of his book shows how to implement a "continuation ratio" method that sometimes works better than a PO model. That provides you some flexibility in how to proceed.

With a PO model you can often model, without overfitting, almost as many parameters as you can with linear regression. Section 4.4 of Harrell's book and course notes provides an estimate of the effective sample size that takes the distribution of cases among categories into account. Your sample size of about 200 would be reduced to an effective sample size of about 180 on that basis, so you should be able to estimate about 12 regression coefficients.

$\endgroup$
3
  • $\begingroup$ Thanks. Indeed, I was also thinking of ordinal regression, and did come across the UCLA website. I will look into it again. $\endgroup$
    – ssciberras
    Commented Feb 5, 2022 at 21:54
  • $\begingroup$ I have run my analysis using polr and glm with quasibinomial, as suggested. Both give similar results, interms of co-efficients and p-values. I seem to understand how to understand the coefficients from the polr, but what about the quasibinmoial model? $\endgroup$
    – ssciberras
    Commented Feb 6, 2022 at 20:05
  • $\begingroup$ @ssciberras sorry, I have no experience with quasibinomial models. I understand it's binomial with non-binomial variance. Coefficients are interpreted the same, but standard errors are determined differently. See this page, this page and this page for a start. If you did a multinomial fit, coefficients represent log-odds of each category versus the reference category, without regard to the ordering among categories. Show the model output, perhaps. $\endgroup$
    – EdM
    Commented Feb 6, 2022 at 20:45
5
$\begingroup$

I would use logit link and binomial family, without too much queasiness.

There will be, or should be, switches in your software to indicate bounds of 0 and 10, or more generally to work with the kind of data You are using R, but I won't attempt even broad coding advice.

I support what I take to be a widespread view that getting the link right (here to respect the bounded nature of the outcome) is more crucial than whichever family you specify. There is always small print about which choices give best guesses at standard errors and P-values.

In practice, you may get very similar predicted values even with gamma and Poisson families, but the inferential details could vary quite a lot. But watch out: with some choices you might get absurd negative predictions for your outcome, given a marginal distribution like that.

In sum: you're focusing on which error family to specify, but the link function is the first choice to make.

On a different level: this could be very awkward data for other reasons, depending on how far people reporting no pain could be qualitatively as well quantitatively different.

$\endgroup$
5
  • $\begingroup$ I'm wondering whether it might potentially be better first to attempt a multinomial logistic regression. There are about 200 observations and 11 levels to model, placing us in a low-data situation. But even a preliminary effort would reveal the extent to which this scale might display important nonlinear behavior over its range. $\endgroup$
    – whuber
    Commented Feb 5, 2022 at 14:17
  • 2
    $\begingroup$ Ordinal logit would seem a closer fit. $\endgroup$
    – Nick Cox
    Commented Feb 5, 2022 at 14:50
  • $\begingroup$ Right, that's what I was trying to refer to. $\endgroup$
    – whuber
    Commented Feb 5, 2022 at 15:48
  • $\begingroup$ Several categorical predictors with several levels for some is natural for what looks like a medical statistical problem, but that could be an awful lot of parameters to estimate. $\endgroup$
    – Nick Cox
    Commented Feb 5, 2022 at 17:05
  • $\begingroup$ Thanks. I am used to binomials and poisson, so this will need me reading a bit before. Again thanks! $\endgroup$
    – ssciberras
    Commented Feb 6, 2022 at 19:52

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.