The document discusses the phases of conducting a survey including establishing goals, determining available resources, defining the target population and sampling frame, choosing a sampling method, designing a questionnaire, piloting the survey, conducting the survey, and analyzing results. It provides an example of a market research study conducted to evaluate consumer acceptance of rigid plastic containers.
The document discusses the phases of conducting a survey including establishing goals, determining available resources, defining the target population and sampling frame, choosing a sampling method, designing a questionnaire, piloting the survey, conducting the survey, and analyzing results. It provides an example of a market research study conducted to evaluate consumer acceptance of rigid plastic containers.
The document discusses the phases of conducting a survey including establishing goals, determining available resources, defining the target population and sampling frame, choosing a sampling method, designing a questionnaire, piloting the survey, conducting the survey, and analyzing results. It provides an example of a market research study conducted to evaluate consumer acceptance of rigid plastic containers.
The document discusses the phases of conducting a survey including establishing goals, determining available resources, defining the target population and sampling frame, choosing a sampling method, designing a questionnaire, piloting the survey, conducting the survey, and analyzing results. It provides an example of a market research study conducted to evaluate consumer acceptance of rigid plastic containers.
MATH3831 Statistical Methods for Social and Market Research SEMESTER 1, 2014 CRICOS Provider No: 00098G c 2014, School of Mathematics and Statistics, UNSW Chapter 1 Introduction to Survey Sampling 1.1 SAMPLING: WHAT & WHY? Sampling is the statistical practice that involves making inferences about an entire popu- lation (human or otherwise) on the basis of only some units in that population. Sampling theory is concerned with: The manner in which samples of units may be selected from a population. The manner in which inferences may be drawn from observations made on selected units. The precision of such statistical inferences. In this course we will consider mainly sampling methods applied to nite populations. These are the only methods of practical sample survey activity conducted by your potential employers, e.g. the Australian Bureau of Statistics (ABS). We may want to nd out how many units have a particular characteristic, or measure the total value of some variable over the population eg total earnings, or some other statistic. A census (measuring the variable on all units) would seem the obvious way to go. Why use sampling instead of census? 1. Reduced cost: A single interview may cost $45-$85. Clearly we could save money by interviewing 1000 people rather than 1 million. 2. Greater speed: in above example, we would have 1000 hours of interviewing instead of 1 million. Add time for printing questionnaires, training interviewers, coding answers. Often information is needed urgently! A census is only performed once in 5 years in Australia (once in 10 years in the UK). 3. Greater scope - census not practical for inquiries requiring highly trained personnel or specialised equipment. Also selection may result in destruction or contamination. Example: destruction testing for battery life. Example: contamination - interviewing people may sensitise them to the topic of the interview. We may want to interview people later on the same topic to see if attitudes have changed. This is possible in sampling but not in a census. 1 2 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING 4. Greater accuracy - In a census we need more interviewers, data coders. In a small study we can obtain more highly skilled people for each stage of survey process. As an upshot, we may sometime end up with more accurate estimates in a survey than in a census. 1.2 PHASES OF THE SURVEY PROCESS 1. Establish the goals of the project. What you want to learn. Researcher must understand why information is needed. One should have a specic list of information needs and degree of precision required. Typical goals in market research are the potential market for a new product or service ratings of current products or services attitudes/satisfaction levels/opinions corporate images 2. Resources available in terms of time, money, personnel, facilities. Data resources such as previous studies, company records. 3. Determine your sample - Who will you interview? How will you select your sample? To do this you will need to determine the following: Observational unit (element) In human populations, this is usually the individ- ual on whom the variable of interest is measured. However, it can also be households, schools, transactions. Target population The complete collection of observations we want to study. The ideal one to meet the survey objectives. If you conduct an employee attitude survey the target populations is obvious. If you are trying to determine the likely success of a new product the target population may be less obvious. Sampling frame A list, register or map from which sampled elements can be se- lected. Provides a means of identifying and locating population elements. Sampled population The population you actually sample from. The collection of all possible observations that might have been chosen in a sample. Some part of the population you really wanted often cannot be surveyed/reached. This may introduce bias. We shall try to quantify this bias later in the course. As long as the surveyed population is a very high proportion of the wanted population, the results obtained should also be true for the larger population. Sampling unit(SU) Unit actually sampled. Example 1.1 Consider a survey of average household income. Target population: all households in Australia Sampling frame: all Australian households listed in the white pages (www.whitepages.com.au) 1.2. PHASES OF THE SURVEY PROCESS 3 Not included in sampling frame SAMPLED POPULATION Not reachable Refuse to respond Not capable of responding Not eligible for survey TARGET POPULATION SAMPLING FRAME POPULATION Figure 1.1: Schematic diagram of the relationship between the target population, sampling frame and the sample population. Sampled population: residents of Australian households who are home when phoned and who agree to participate. Example 1.2 Identify: (a) the elements, (b) the target population, (c) a possible sampling frame, if we want to nd: (a) Average student fees for undergraduate students at UNSW (b) Number of 10 year olds in NSW who have read Harry Potter 4 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING (c) The average number of cats per household in Sydney 4. Methods of sampling (details to follow) 5. Method of data collection How will you interview? - mail, personal interview, telephone, recording. 6. Questionnaire design. What will you ask? 7. Pilot study. If practical, pre-test the questions. 8. Conduct the survey and enter data. Ask the questions. 9. Analyse the data. Do the statistical analysis, interpret the results and produce the reports. Example 1.3 Rigid Plastic Containers Consumer Acceptance Study The Society of the Plastic Industry (SPI) believed rigid plastic containers oered important advantages over other container materials (paper, glass, metal) including light weight, resistance to breakage, cheapness, and potential for re-use. A market research study was conducted to identify and evaluate market opportunities for rigid plastic containers. Rationale: It was the opinion of SPI that demonstrated consumer acceptance or pref- erence would be a critical factor, in the absence of an unfavourable cost dierential or excessive distribution problems, in convincing industries to switch to rigid plastic. Research Objectives: To determine whether container markets have greater consumer acceptance of plastic containers. Information Needs: Determine consumer preference for alternative packaging materials in container mar- kets. Identify characteristics of containers that inuence consumer preference. Determine likes and dislikes of consumers regarding current containers. Determine what suggestions consumers have for packaging improvement. Determine consumer attitudes towards ecological aspects of packaging. 1.2. PHASES OF THE SURVEY PROCESS 5 Data Sources. To meet information needs we must interrogate consumers. The rst phase includes a series of focus group interviews to explore consumer attitudes and motives concerning the pros and cons of packaging and ecological issues. Based on these ndings, specic questions can be developed for the second phase - a survey of consumers using a questionnaire administered by personal interview. Questionnaire Design and Pre-testing. The questionnaire was pre-tested on a convenience sample of about 75 consumers to make sure the proper ow existed, the questions were understandable to ordinary individuals, and analyse items for redundancy. This was accomplished by factor analysis. Example of questions 1. Of the packages you currently purchase, which do you feel could be improved? Why? 2. What products do you currently purchase which come in a plastic container? 3. What are the advantages of a plastic container? 4. What are the disadvantages of a plastic container? 5. Please evaluate plastic/paper/metal in regard to the degree it possesses lightness/strength and recyclability (Rating scale). 6. How important is lightness/strength/recyclability for a container? (Rating scale). 7. Interviewer checks male or female. 8. What is your marital status? 9. How many children do you have at home? Data Collection Procedure. It was determined that the interviews could be successfully conducted over the phone. Sample Design. Telephone numbers were selected using the methods of random digit dialling. Under this procedure, three-digit exchange codes supplied by the phone company are combined with four digit random numbers to give each telephone in the region an equal probability of selection. A number of call-backs were made if there was no answer or the line was busy. In all above 500 interviews were completed over several weeks. Editing, Coding and Data Processing Completed interviews were edited to make sure they were legible, complete, consistent and accurate. In some cases where data was missing, estimates were made based on other information in the questionnaire. 6 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING 1.3 ERRORS Sampling Errors. Sampling errors are associated with the process of selecting a sample. Because the sample is used to estimate the population, dierences exist between the sample value and the true underlying population value. This dierence is called sample error. Non Sampling Errors. Non sampling errors are all those errors that occur in the research process except the sampling error. This includes all aspects of the process where mistakes or deliberate deceptions can occur. 1.3.1 Types of Non Sampling Errors Frame problems The problems are with the ability of the frame and thus the sample to cover the popu- lation, which is why coverage errors is another way of referring to frame problems. Undercoverage: some members of the population are not linked to any entry on the frame. Mainly increases bias. Overcoverage: Some entries on the frame are linked to non-members of the popula- tion. Tends to reduce sample size and hence increase variance. Multiplicity: A member of the population is linked to more than one entry on the frame, giving it multiple chances of being chosen. Response Errors Reasons for response errors include: Poor questionnaire design. It is essential that survey questions are worded carefully in order to avoid introducing bias. Interviewer bias. An interviewer can inuence how a respondent answers the survey questions. This may occur if the interviewer is too friendly or too aloof or prompts the respondent. Interviewers must be trained to be neutral. Respondent errors. Respondents can also provide incorrect answers. Faulty recollec- tions, tendencies to exaggerate or underplay events, and inclinations to give answers that are more socially desirable are several reasons why a respondent may provide a false answer. Problems with the survey process. Using proxy responses (taking answers from some- one other than the person of interest). 1.4. DATA COLLECTION METHODS 7 Non response errors These errors occur when the survey fails to measure some of the units in the selected sample. People may refuse or be unable to be part of sample or are not at home during the sampling period. The response rate is the number of completed, usable responses divided by the number of sampled units. If this fraction is too low, there is a strong possibility of non-response error; that is, the estimates are biased because those who did respond to the survey have dierent charac- teristics or opinions than those who did not respond. There is no way of knowing for sure what non-respondents are like or what they are thinking. Example 1.4 Suppose 500 surveys are sent to students asking them whether they prefer Coee on Campus or JGs. 150 (30%) respond, of which 98 choose ConC, and 52 choose JGs. So a clear majority of 65% favour ConC. Consider now the non-respondents. Suppose 55% of non respondents actually favour JGs and 45% ConC. The true percentage preferring ConC is .365%+.745% = 51% indicating no clear preference. Processing errors Errors can occur while data is being recorded, coded, or edited. Improper analysis When calculating statistics from the sample, the estimation technique may be inappropri- ate. We will spend a lot of time studying the suitable estimation techniques. 1.4 DATA COLLECTION METHODS There are two methods of acquiring data from sample units: communication and observa- tion. Communication requires the respondent to actively provide data through response, while observation involves the recording of the respondents behaviour. 1.4.1 Observation methods The observation method cannot measure awareness, belief or preferences. Observed behaviour patterns must be of short duration and occur frequently. Usually necessary when sampling units are not people. Examples: Watch what brands people choose in shop. 8 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING Audiometer, developed by the A.C. Nielson Company, records when TV sets are turned on and to what station they are tuned. Pupillometer measures change in the diameter of the eye pupil. An increase in diameter is assumed to reect a persons favourable reaction. 1.4.2 Communication Methods Some examples: 1. Personal Interviews 2. Telephone Interviews 3. Mail Interviews 4. Web Interviews 1.4.3 Selecting a sampling method Criteria for selecting among these media include versatility cost time sample control quantity of data response rate Advantages of observation methods: do not rely on respondents willingness to provide data potential for bias from interviewer is reduced. certain types of data can only be collected by this method Disadvantages of observation methods: can not observe some behaviour patterns. cost and time constraints. 1.5. SAMPLING PROCEDURES 9 1.5 SAMPLING PROCEDURES There are two types of procedures Probability sampling Each element of the population has a known chance of being selected. Sampling is done by mathematical decision rules that leave no discretion to the eld interviewer. Non-probability sampling The selection of a population element to be part of the sam- ple is based in some part on the judgement of the researcher or interviewer. There is no known chance of any particular element in the population being selected, so we are unable to calculate sampling error. We have no idea whether the sample estimates are accurate or not. Sample Procedures Non-Probability Procedures Probability Procedures 1. Convenience sampling 1. Simple random sampling 2. Judgement Sampling 2. Systematic sampling 3. Snowball Sampling 3. Stratied sampling 4. Quota Sampling 4. Cluster sampling a. simple b. multi-stage. 5. Unequal Probability sampling 1.5.1 Advantages of probability sampling Probability sampling allows the researcher to measure the amount of sampling error likely to occur. This provides a measure of the accuracy of the sample result. No such measure exists with non-probability sampling. 1.5.2 Advantages of non-probability sampling quick inexpensive sometimes it is unfeasible to conduct probability sampling (e.g. lack of sampling frame.) 1.6 NONPROBABILITY SAMPLING PROCEDURES Below are a few common non-probability sampling procedures. These will only be briey reviewed, because the main focus in this course will be on probability sampling procedures, and how to use these methods to estimate sampling error. 10 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING 1.6.1 Convenience Sampling Convenience samples are collected on the basis of the convenience of the researcher. Ex- amples include stopping people in a mall, using students or church groups. The sample unit is self-selected or selected because it is easily available; it is unclear what population the sample is drawn from. The sample is chosen without use of a specic survey method. May be useful for exploratory research, pilot studies. It could deliver accurate results if the population is homogeneous. A particular type of convenience sampling is volunteer sampling, where the sample unit is self selected. Examples: phone-in samples on current aairs programs, volunteers for drug-testing studies. While all non-probability sampling methods have the potential to introduce sampling bias, volunteer sampling is particularly notorious in this regard. Specic problems: The proportion who volunteer may be small. There is usually no way of nding out how or if those who volunteered are dierent from those who did not. Volunteers often have stronger opinions about a subject than the rest of the popula- tion. Example 1.5 Literary Digest poll The Literary Digest conducted a huge poll to predict the result of the 1936 US Presidential election. This poll had correctly predicted the winner of every election since 1912. The 1936 poll was the largest survey ever undertaken the Digest had mailed 10 million questionnaires to readers, and received 2.5 million in reply. The poll condently predicted Alfred Landon would win the election, but instead, Franklin Roosevelt won by the biggest landslide in history, getting 62% of the vote. What went wrong? 1.6.2 Judgement Sampling Samples are selected on the basis of whether some expert thinks those sample units will contribute to answering the research questions at hand, e.g. instructors choice of someone to answer question, expert witnesses, selection of stores to try new product. Judgement sampling is subject to the researchers biases. Statisticians often use this method in exploratory studies like pre-testing of questionnaires and focus groups. 1.6. NONPROBABILITY SAMPLING PROCEDURES 11 1.6.3 Snowball sampling Another type of convenience sampling is snowball sampling. You begin by identifying some- one who meets the criteria for inclusion in your study. You then ask them to recommend others who they know who may also meet the criteria. Survey those recommended, then ask them to recommend others. Snowball sampling is especially useful when you are trying to reach populations that are hard to nd. For a study of the homeless for example, you are unlikely to nd a good list of homeless people. However, if you nd one or two they may know where others are. 1.6.4 Quota Sampling Sampling is done until a specic number of units (quotas) for various sub-populations have been selected. The researcher may take steps to obtain a sample that is similar to the population on some pre specied control characteristics. This is known as proportional quota sampling. If there are 100 men and 100 women in a population, and a sample of 20 is to be drawn in a cola taste test, you may want to divide the sample evenly between the two sexes - 10 men and 10 women. You will continue sampling until you get the numbers you need in each category. In non-proportional quota sampling you just specify the minimum number of sampled units you want in each category. You simply want enough to assure that you will be able to talk about even small groups in the population. Quota sampling is somewhat similar to the probability sampling method called stratied sampling (chapter 5). It diers in how the units are selected. In probability sampling the units are selected randomly, while in quota sampling it is usually left up to the interviewer to decide who is sampled. 1.6.5 Problems with quota sampling The proportion of respondents assigned to each cell must be accurate and up-to-date. This is often dicult or impossible. The proper control characteristics must be selected. e.g. To nd voter preferences, the sample is selected according to age, education and income. Are these three variables the most relevant for classifying the typical voter? What about religion or ethnicity? Finding the required number of respondents for some cells may not be easy. Bias introduced by interviewers selection. 12 CHAPTER 1. INTRODUCTION TO SURVEY SAMPLING 1.7 OVERVIEW OF PROBABILITY SAMPLING PRO- CEDURES Simple random sampling (SRS, chapter 2) Select a sample of n units such that each sample of size n has the same prob of being selected. Ratio Estimation, Regression Estimation (chapter 3) Measure a concomitant vari- able about which much is known, then use the relationship between that variable and the variable of interest to improve estimation. Systematic Sampling (chapter 4) Randomly select rst unit for sample, then take all other elements separated by a constant amount along the frame. Stratied sampling (chapter 5) Divide population into groups based on some charac- teristic associated with each element, then take a sample from each group. Unequal Probability Sampling (chapter 6) Select a sample of n units with probabil- ities equal to some pre-specied values. Cluster sampling (chapter 7) divide elements into groups such that each group is rep- resentative of the population and take an SRS of the groups, including all elements in the chosen groups in the sample. Multi-stage cluster sampling (chapter 8) as for cluster sampling, but within each chosen cluster take an SRS of elements. SRS Systematic Stratied S1 S2 C3 C1 Cluster C2 C4 C3 C1 C4 C2 Multi stage cluster Figure 1.2: Schematic diagram of dierent probability sampling schemes 1.8. EFFECTS OF IGNORINGSAMPLINGPROCEDURE ONSTATISTICAL INFERENCE13 1.8 EFFECTS OF IGNORING SAMPLING PRO- CEDURE ON STATISTICAL INFERENCE It is common to conduct a survey not using an SRS procedure, then analyse it without taking the procedure into account (that is, by applying methods that are suitable for SRS sampling only). We will investigate the eects of this methodology in a greater detail in this course. For now, we could summarise the eects as follows: Ignoring cluster sampling tends to underestimate variance. Ignoring stratied sampling tends to overestimate variance.