CH 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Faculty of Science

School of Mathematics and Statistics


MATH3831
Statistical Methods for Social and
Market Research
SEMESTER 1, 2014
CRICOS Provider No: 00098G c 2014, School of Mathematics and Statistics, UNSW
Chapter 1
Introduction to Survey Sampling
1.1 SAMPLING: WHAT & WHY?
Sampling is the statistical practice that involves making inferences about an entire popu-
lation (human or otherwise) on the basis of only some units in that population. Sampling
theory is concerned with:
The manner in which samples of units may be selected from a population.
The manner in which inferences may be drawn from observations made on selected
units.
The precision of such statistical inferences.
In this course we will consider mainly sampling methods applied to nite populations.
These are the only methods of practical sample survey activity conducted by your potential
employers, e.g. the Australian Bureau of Statistics (ABS).
We may want to nd out how many units have a particular characteristic, or measure the
total value of some variable over the population eg total earnings, or some other statistic.
A census (measuring the variable on all units) would seem the obvious way to go. Why
use sampling instead of census?
1. Reduced cost: A single interview may cost $45-$85. Clearly we could save money
by interviewing 1000 people rather than 1 million.
2. Greater speed: in above example, we would have 1000 hours of interviewing instead
of 1 million. Add time for printing questionnaires, training interviewers, coding
answers. Often information is needed urgently! A census is only performed once in
5 years in Australia (once in 10 years in the UK).
3. Greater scope - census not practical for inquiries requiring highly trained personnel
or specialised equipment. Also selection may result in destruction or contamination.
Example: destruction testing for battery life. Example: contamination - interviewing
people may sensitise them to the topic of the interview. We may want to interview
people later on the same topic to see if attitudes have changed. This is possible in
sampling but not in a census.
1
2 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING
4. Greater accuracy - In a census we need more interviewers, data coders. In a small
study we can obtain more highly skilled people for each stage of survey process. As
an upshot, we may sometime end up with more accurate estimates in a survey than
in a census.
1.2 PHASES OF THE SURVEY PROCESS
1. Establish the goals of the project. What you want to learn. Researcher must
understand why information is needed. One should have a specic list of information
needs and degree of precision required.
Typical goals in market research are
the potential market for a new product or service
ratings of current products or services
attitudes/satisfaction levels/opinions
corporate images
2. Resources available in terms of time, money, personnel, facilities. Data resources
such as previous studies, company records.
3. Determine your sample - Who will you interview? How will you select your
sample? To do this you will need to determine the following:
Observational unit (element) In human populations, this is usually the individ-
ual on whom the variable of interest is measured. However, it can also be
households, schools, transactions.
Target population The complete collection of observations we want to study. The
ideal one to meet the survey objectives. If you conduct an employee attitude
survey the target populations is obvious. If you are trying to determine the
likely success of a new product the target population may be less obvious.
Sampling frame A list, register or map from which sampled elements can be se-
lected. Provides a means of identifying and locating population elements.
Sampled population The population you actually sample from. The collection of
all possible observations that might have been chosen in a sample. Some part of
the population you really wanted often cannot be surveyed/reached. This may
introduce bias. We shall try to quantify this bias later in the course. As long
as the surveyed population is a very high proportion of the wanted population,
the results obtained should also be true for the larger population.
Sampling unit(SU) Unit actually sampled.
Example 1.1
Consider a survey of average household income.
Target population: all households in Australia
Sampling frame: all Australian households listed in the white pages (www.whitepages.com.au)
1.2. PHASES OF THE SURVEY PROCESS 3
Not included in
sampling frame
SAMPLED
POPULATION
Not reachable
Refuse to
respond
Not capable
of responding
Not eligible
for survey
TARGET POPULATION
SAMPLING
FRAME
POPULATION
Figure 1.1: Schematic diagram of the relationship between the target population, sampling
frame and the sample population.
Sampled population: residents of Australian households who are home when phoned
and who agree to participate.
Example 1.2
Identify:
(a) the elements,
(b) the target population,
(c) a possible sampling frame,
if we want to nd:
(a) Average student fees for undergraduate students at UNSW
(b) Number of 10 year olds in NSW who have read Harry Potter
4 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING
(c) The average number of cats per household in Sydney
4. Methods of sampling (details to follow)
5. Method of data collection How will you interview? - mail, personal interview,
telephone, recording.
6. Questionnaire design. What will you ask?
7. Pilot study. If practical, pre-test the questions.
8. Conduct the survey and enter data. Ask the questions.
9. Analyse the data. Do the statistical analysis, interpret the results and produce
the reports.
Example 1.3
Rigid Plastic Containers Consumer Acceptance Study
The Society of the Plastic Industry (SPI) believed rigid plastic containers oered important
advantages over other container materials (paper, glass, metal) including light weight,
resistance to breakage, cheapness, and potential for re-use. A market research study was
conducted to identify and evaluate market opportunities for rigid plastic containers.
Rationale: It was the opinion of SPI that demonstrated consumer acceptance or pref-
erence would be a critical factor, in the absence of an unfavourable cost dierential or
excessive distribution problems, in convincing industries to switch to rigid plastic.
Research Objectives: To determine whether container markets have greater consumer
acceptance of plastic containers.
Information Needs:
Determine consumer preference for alternative packaging materials in container mar-
kets.
Identify characteristics of containers that inuence consumer preference.
Determine likes and dislikes of consumers regarding current containers. Determine
what suggestions consumers have for packaging improvement. Determine consumer
attitudes towards ecological aspects of packaging.
1.2. PHASES OF THE SURVEY PROCESS 5
Data Sources. To meet information needs we must interrogate consumers. The rst
phase includes a series of focus group interviews to explore consumer attitudes and motives
concerning the pros and cons of packaging and ecological issues. Based on these ndings,
specic questions can be developed for the second phase - a survey of consumers using a
questionnaire administered by personal interview.
Questionnaire Design and Pre-testing.
The questionnaire was pre-tested on a convenience sample of about 75 consumers to make
sure the proper ow existed, the questions were understandable to ordinary individuals,
and analyse items for redundancy. This was accomplished by factor analysis.
Example of questions
1. Of the packages you currently purchase, which do you feel could be improved? Why?
2. What products do you currently purchase which come in a plastic container?
3. What are the advantages of a plastic container?
4. What are the disadvantages of a plastic container?
5. Please evaluate plastic/paper/metal in regard to the degree it possesses lightness/strength
and recyclability (Rating scale).
6. How important is lightness/strength/recyclability for a container? (Rating scale).
7. Interviewer checks male or female.
8. What is your marital status?
9. How many children do you have at home?
Data Collection Procedure. It was determined that the interviews could be successfully
conducted over the phone.
Sample Design. Telephone numbers were selected using the methods of random digit
dialling. Under this procedure, three-digit exchange codes supplied by the phone company
are combined with four digit random numbers to give each telephone in the region an equal
probability of selection. A number of call-backs were made if there was no answer or the
line was busy. In all above 500 interviews were completed over several weeks.
Editing, Coding and Data Processing
Completed interviews were edited to make sure they were legible, complete, consistent
and accurate. In some cases where data was missing, estimates were made based on other
information in the questionnaire.
6 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING
1.3 ERRORS
Sampling Errors. Sampling errors are associated with the process of selecting a sample.
Because the sample is used to estimate the population, dierences exist between the sample
value and the true underlying population value. This dierence is called sample error.
Non Sampling Errors. Non sampling errors are all those errors that occur in the research
process except the sampling error. This includes all aspects of the process where mistakes
or deliberate deceptions can occur.
1.3.1 Types of Non Sampling Errors
Frame problems
The problems are with the ability of the frame and thus the sample to cover the popu-
lation, which is why coverage errors is another way of referring to frame problems.
Undercoverage: some members of the population are not linked to any entry on the
frame. Mainly increases bias.
Overcoverage: Some entries on the frame are linked to non-members of the popula-
tion. Tends to reduce sample size and hence increase variance.
Multiplicity: A member of the population is linked to more than one entry on the
frame, giving it multiple chances of being chosen.
Response Errors
Reasons for response errors include:
Poor questionnaire design. It is essential that survey questions are worded carefully
in order to avoid introducing bias.
Interviewer bias. An interviewer can inuence how a respondent answers the survey
questions. This may occur if the interviewer is too friendly or too aloof or prompts
the respondent. Interviewers must be trained to be neutral.
Respondent errors. Respondents can also provide incorrect answers. Faulty recollec-
tions, tendencies to exaggerate or underplay events, and inclinations to give answers
that are more socially desirable are several reasons why a respondent may provide
a false answer.
Problems with the survey process. Using proxy responses (taking answers from some-
one other than the person of interest).
1.4. DATA COLLECTION METHODS 7
Non response errors
These errors occur when the survey fails to measure some of the units in the selected
sample. People may refuse or be unable to be part of sample or are not at home during the
sampling period. The response rate is the number of completed, usable responses divided
by the number of sampled units.
If this fraction is too low, there is a strong possibility of non-response error; that is, the
estimates are biased because those who did respond to the survey have dierent charac-
teristics or opinions than those who did not respond. There is no way of knowing for sure
what non-respondents are like or what they are thinking.
Example 1.4
Suppose 500 surveys are sent to students asking them whether they prefer Coee on
Campus or JGs. 150 (30%) respond, of which 98 choose ConC, and 52 choose
JGs. So a clear majority of 65% favour ConC.
Consider now the non-respondents. Suppose 55% of non respondents actually favour
JGs and 45% ConC. The true percentage preferring ConC is .365%+.745% =
51% indicating no clear preference.
Processing errors
Errors can occur while data is being recorded, coded, or edited.
Improper analysis
When calculating statistics from the sample, the estimation technique may be inappropri-
ate. We will spend a lot of time studying the suitable estimation techniques.
1.4 DATA COLLECTION METHODS
There are two methods of acquiring data from sample units: communication and observa-
tion. Communication requires the respondent to actively provide data through response,
while observation involves the recording of the respondents behaviour.
1.4.1 Observation methods
The observation method cannot measure awareness, belief or preferences.
Observed behaviour patterns must be of short duration and occur frequently.
Usually necessary when sampling units are not people.
Examples:
Watch what brands people choose in shop.
8 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING
Audiometer, developed by the A.C. Nielson Company, records when TV sets are
turned on and to what station they are tuned.
Pupillometer measures change in the diameter of the eye pupil. An increase in
diameter is assumed to reect a persons favourable reaction.
1.4.2 Communication Methods
Some examples:
1. Personal Interviews
2. Telephone Interviews
3. Mail Interviews
4. Web Interviews
1.4.3 Selecting a sampling method
Criteria for selecting among these media include
versatility
cost
time
sample control
quantity of data
response rate
Advantages of observation methods:
do not rely on respondents willingness to provide data
potential for bias from interviewer is reduced.
certain types of data can only be collected by this method
Disadvantages of observation methods:
can not observe some behaviour patterns.
cost and time constraints.
1.5. SAMPLING PROCEDURES 9
1.5 SAMPLING PROCEDURES
There are two types of procedures
Probability sampling Each element of the population has a known chance of being
selected. Sampling is done by mathematical decision rules that leave no discretion
to the eld interviewer.
Non-probability sampling The selection of a population element to be part of the sam-
ple is based in some part on the judgement of the researcher or interviewer. There
is no known chance of any particular element in the population being selected, so
we are unable to calculate sampling error. We have no idea whether the sample
estimates are accurate or not.
Sample Procedures
Non-Probability Procedures Probability Procedures
1. Convenience sampling 1. Simple random sampling
2. Judgement Sampling 2. Systematic sampling
3. Snowball Sampling 3. Stratied sampling
4. Quota Sampling 4. Cluster sampling
a. simple
b. multi-stage.
5. Unequal Probability sampling
1.5.1 Advantages of probability sampling
Probability sampling allows the researcher to measure the amount of sampling error likely
to occur. This provides a measure of the accuracy of the sample result. No such measure
exists with non-probability sampling.
1.5.2 Advantages of non-probability sampling
quick
inexpensive
sometimes it is unfeasible to conduct probability sampling (e.g. lack of sampling
frame.)
1.6 NONPROBABILITY SAMPLING PROCEDURES
Below are a few common non-probability sampling procedures. These will only be briey
reviewed, because the main focus in this course will be on probability sampling procedures,
and how to use these methods to estimate sampling error.
10 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING
1.6.1 Convenience Sampling
Convenience samples are collected on the basis of the convenience of the researcher. Ex-
amples include stopping people in a mall, using students or church groups. The sample
unit is self-selected or selected because it is easily available; it is unclear what population
the sample is drawn from. The sample is chosen without use of a specic survey method.
May be useful for exploratory research, pilot studies. It could deliver accurate results if
the population is homogeneous.
A particular type of convenience sampling is volunteer sampling, where the sample unit
is self selected. Examples: phone-in samples on current aairs programs, volunteers for
drug-testing studies.
While all non-probability sampling methods have the potential to introduce sampling bias,
volunteer sampling is particularly notorious in this regard. Specic problems:
The proportion who volunteer may be small.
There is usually no way of nding out how or if those who volunteered are dierent
from those who did not.
Volunteers often have stronger opinions about a subject than the rest of the popula-
tion.
Example 1.5
Literary Digest poll
The Literary Digest conducted a huge poll to predict the result of the 1936 US Presidential
election. This poll had correctly predicted the winner of every election since 1912. The
1936 poll was the largest survey ever undertaken the Digest had mailed 10 million
questionnaires to readers, and received 2.5 million in reply.
The poll condently predicted Alfred Landon would win the election, but instead, Franklin
Roosevelt won by the biggest landslide in history, getting 62% of the vote.
What went wrong?
1.6.2 Judgement Sampling
Samples are selected on the basis of whether some expert thinks those sample units will
contribute to answering the research questions at hand, e.g. instructors choice of someone
to answer question, expert witnesses, selection of stores to try new product.
Judgement sampling is subject to the researchers biases. Statisticians often use this
method in exploratory studies like pre-testing of questionnaires and focus groups.
1.6. NONPROBABILITY SAMPLING PROCEDURES 11
1.6.3 Snowball sampling
Another type of convenience sampling is snowball sampling. You begin by identifying some-
one who meets the criteria for inclusion in your study. You then ask them to recommend
others who they know who may also meet the criteria. Survey those recommended, then
ask them to recommend others. Snowball sampling is especially useful when you are trying
to reach populations that are hard to nd. For a study of the homeless for example, you
are unlikely to nd a good list of homeless people. However, if you nd one or two they
may know where others are.
1.6.4 Quota Sampling
Sampling is done until a specic number of units (quotas) for various sub-populations
have been selected. The researcher may take steps to obtain a sample that is similar to the
population on some pre specied control characteristics. This is known as proportional
quota sampling. If there are 100 men and 100 women in a population, and a sample of 20
is to be drawn in a cola taste test, you may want to divide the sample evenly between the
two sexes - 10 men and 10 women. You will continue sampling until you get the numbers
you need in each category.
In non-proportional quota sampling you just specify the minimum number of sampled units
you want in each category. You simply want enough to assure that you will be able to talk
about even small groups in the population.
Quota sampling is somewhat similar to the probability sampling method called stratied
sampling (chapter 5). It diers in how the units are selected. In probability sampling the
units are selected randomly, while in quota sampling it is usually left up to the interviewer
to decide who is sampled.
1.6.5 Problems with quota sampling
The proportion of respondents assigned to each cell must be accurate and up-to-date.
This is often dicult or impossible.
The proper control characteristics must be selected. e.g. To nd voter preferences,
the sample is selected according to age, education and income. Are these three
variables the most relevant for classifying the typical voter? What about religion or
ethnicity?
Finding the required number of respondents for some cells may not be easy.
Bias introduced by interviewers selection.
12 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING
1.7 OVERVIEW OF PROBABILITY SAMPLING PRO-
CEDURES
Simple random sampling (SRS, chapter 2) Select a sample of n units such that each
sample of size n has the same prob of being selected.
Ratio Estimation, Regression Estimation (chapter 3) Measure a concomitant vari-
able about which much is known, then use the relationship between that variable and
the variable of interest to improve estimation.
Systematic Sampling (chapter 4) Randomly select rst unit for sample, then take all
other elements separated by a constant amount along the frame.
Stratied sampling (chapter 5) Divide population into groups based on some charac-
teristic associated with each element, then take a sample from each group.
Unequal Probability Sampling (chapter 6) Select a sample of n units with probabil-
ities equal to some pre-specied values.
Cluster sampling (chapter 7) divide elements into groups such that each group is rep-
resentative of the population and take an SRS of the groups, including all elements
in the chosen groups in the sample.
Multi-stage cluster sampling (chapter 8) as for cluster sampling, but within each
chosen cluster take an SRS of elements.
SRS
Systematic
Stratied
S1 S2
C3
C1
Cluster
C2
C4 C3
C1
C4
C2
Multi stage cluster
Figure 1.2: Schematic diagram of dierent probability sampling schemes
1.8. EFFECTS OF IGNORINGSAMPLINGPROCEDURE ONSTATISTICAL INFERENCE13
1.8 EFFECTS OF IGNORING SAMPLING PRO-
CEDURE ON STATISTICAL INFERENCE
It is common to conduct a survey not using an SRS procedure, then analyse it without
taking the procedure into account (that is, by applying methods that are suitable for SRS
sampling only). We will investigate the eects of this methodology in a greater detail in
this course. For now, we could summarise the eects as follows:
Ignoring cluster sampling tends to underestimate variance.
Ignoring stratied sampling tends to overestimate variance.

You might also like