0
$\begingroup$

I'd appreciate some help solving this sampling issue:

I have select a random sample of people for a survey in a certain country.

The country is divided in 6 regions.

In total, there are 981 municipalities in those regions.

I have to select 5 municipalities from each region, for a total of 30 municipalities.

And then, I have to randomly select 30 people from each municipality.

Is there a way to do this in R or SPSS?

$\endgroup$
4
  • $\begingroup$ Certainly there are ways. It's even straightforward to do it with pencil, paper, and a table of random numbers. But could you be more specific about what you mean by "randomly"? What probabilities do you want to use? That is, how do you want to weight the selections and why? $\endgroup$
    – whuber
    Commented Aug 15, 2017 at 22:16
  • $\begingroup$ By "randomly" I mean that there are 1.185.286 people in those 6 regions and 981 municipalities. But I only need to sample 900. $\endgroup$
    – Chris
    Commented Aug 15, 2017 at 22:18
  • $\begingroup$ Thank you--but that doesn't clarify things. Do you need to select each of those people with equal probability or not? $\endgroup$
    – whuber
    Commented Aug 16, 2017 at 14:11
  • $\begingroup$ @whuber, this is a very typical sampling design request. And getting equal probability of selection is approximately immaterial as you won't be able to analyze the data as if they were i.i.d. anyway because of clustering. $\endgroup$
    – StasK
    Commented Aug 17, 2017 at 14:34

1 Answer 1

1
$\begingroup$

This is a stratified, cluster sample. (If you have never heard these words before, you are probably not prepared to handle the project... sorry, but that's what it is; you may want to seek a consulting statistician to do this; Statistics without Borders may be able to help.)

Your strata are regions: all six are in the sample, and samples are to be taken independently across them.

Your clusters, or primary sampling units (PSUs), are municipalities. I would select them with probabilities proportional to size. In R, this is to be done with library(sampling)... I would use sampling::UPmaxentropy() as the method that is closest to SRS in its statistical properties (and hence harder to screw up at the analysis stage, vs. say systematic sampling).

Then if you have the list of people in that municipality (which would really impress me), you could take a simple random sample from that list. If you don't have that list, then you need to add another sampling stage which would depend on census or other population data on sizes of cities and towns and villages within the municipality (and probably some administrative divisions within the larger cities). You would again take a PPS sample at that stage, aiming to get to areas of 2,000-5,000 people that could be managed by field enumerators. (In the U.S., areas of this size are referred to as census tracts; they are artificial geographic entities that mainly exist for the purposes of other data collections.) And once you enumerate dwellings/households at that last step, you can draw your sample of 30 people (which would probably be 10 people in 3 "census tracts"). If you don't have detailed data, you'd probably have to rely on maps, and drawing units on the map pretty much by hand (well... "by hand" these days means in a GIS program).

One of the problems I have encountered many times in international work is that once you took the sample, you (or your local contractors) destroyed all the records that you had used. This makes the data nearly unusable. To analyze the data properly (library(survey) in R; I wrote a chapter on this recently, as well), you need selection probabilities (and to construct these, you need the population counts at every stage) and identifiers of the strata and sampling units. You need to make absolutely sure that this information travels into your final data set.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.