Essential Statistics: Using and Understanding Data
()
About this ebook
This book covers the basics of statistics and data, as well as more advanced topics, including:
- Descriptive statistics, data displays, central location, and deviations
- Discrete probability distributions
- Continuous probability distributions
- Confidence intervals
- Hypothesis testing
- Correlation and linear regression
- Analysis of variance (ANOVA)
- Nonparametric statistics
Written by an actual teacher, Essential Statistics recognizes the need for down-to-earth math instruction. It perfectly addresses this by giving students accessible, linear, and relevant context for why statistics are what its title suggests: essential.
Sheeny Behmard
Sheeny Behmard has taught mathematics and statistics at Chemeketa Community College since 2000. She lives in Portland, Oregon, with her husband, a son, and a cat named Sina.
Related to Essential Statistics
Related ebooks
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics Rating: 4 out of 5 stars4/5Statistics: Basic Principles and Applications Rating: 0 out of 5 stars0 ratingsStatistical Analysis for Beginners: Comprehensive Introduction Rating: 0 out of 5 stars0 ratingsMachine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4 Rating: 0 out of 5 stars0 ratingsStatistics for Earth and Environmental Scientists Rating: 0 out of 5 stars0 ratingsStatistics Super Review, 2nd Ed. Rating: 5 out of 5 stars5/5Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries Rating: 5 out of 5 stars5/5Statistics For Dummies Rating: 3 out of 5 stars3/5Data Analytics Rating: 1 out of 5 stars1/5Health and Numbers: A Problems-Based Introduction to Biostatistics Rating: 0 out of 5 stars0 ratingsSurviving Statistics: A Professor's Guide to Getting Through Rating: 0 out of 5 stars0 ratingsModeling Online Auctions Rating: 0 out of 5 stars0 ratingsIntroduction To Business Statistics Through R Software: Software Rating: 0 out of 5 stars0 ratingsAssociations and Correlations for Medical Research Rating: 0 out of 5 stars0 ratingsHow to Understand and Appreciate Statistics? Brief Simple Guide for the Puzzled Learners Rating: 0 out of 5 stars0 ratingsPainless Statistics Rating: 0 out of 5 stars0 ratings"Data Analysis" Basic Concepts and Applications Rating: 0 out of 5 stars0 ratingsThinking Statistically Rating: 5 out of 5 stars5/5Computational Statistics Rating: 5 out of 5 stars5/5Statistical Inference: A Short Course Rating: 4 out of 5 stars4/5Microsoft Excel Statistical and Advanced Functions for Decision Making Rating: 5 out of 5 stars5/5Foundations of Statistical Analysis Rating: 0 out of 5 stars0 ratingsData Analysis for Beginners: The ABCs of Data Analysis. An Easy-to-Understand Guide for Beginners Rating: 0 out of 5 stars0 ratingsMultiple Imputation and its Application Rating: 0 out of 5 stars0 ratingsCommon Errors in Statistics (and How to Avoid Them) Rating: 0 out of 5 stars0 ratingsNumbers Rating: 0 out of 5 stars0 ratingsAnalyzing Quantitative Data: An Introduction for Social Researchers Rating: 0 out of 5 stars0 ratingsA Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R Rating: 0 out of 5 stars0 ratingsAll About Data Science: Learn Data Science from scratch Rating: 0 out of 5 stars0 ratingsSummary of How Not To Be Wrong: by Jordan Ellenberg | Includes Analysis Rating: 0 out of 5 stars0 ratings
Mathematics For You
What If?: Serious Scientific Answers to Absurd Hypothetical Questions Rating: 5 out of 5 stars5/5Algorithms to Live By: The Computer Science of Human Decisions Rating: 4 out of 5 stars4/5The Little Book of Mathematical Principles, Theories & Things Rating: 3 out of 5 stars3/5Quantum Physics for Beginners Rating: 4 out of 5 stars4/5Build a Mathematical Mind - Even If You Think You Can't Have One Rating: 0 out of 5 stars0 ratingsThe Art of Logic: How to Make Sense in a World that Doesn't Rating: 0 out of 5 stars0 ratingsSummary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis Rating: 5 out of 5 stars5/5Algebra - The Very Basics Rating: 5 out of 5 stars5/5Introducing Game Theory: A Graphic Guide Rating: 4 out of 5 stars4/5My Best Mathematical and Logic Puzzles Rating: 4 out of 5 stars4/5Game Theory: A Simple Introduction Rating: 4 out of 5 stars4/5Geometry For Dummies Rating: 4 out of 5 stars4/5Calculus For Dummies Rating: 4 out of 5 stars4/5How to Solve It: A New Aspect of Mathematical Method Rating: 4 out of 5 stars4/5Statistics: a QuickStudy Laminated Reference Guide Rating: 0 out of 5 stars0 ratingsHow Minds Change: The New Science of Belief, Opinion and Persuasion Rating: 0 out of 5 stars0 ratingsThe Art of Statistical Thinking Rating: 5 out of 5 stars5/5Calculus Essentials For Dummies Rating: 5 out of 5 stars5/5Logicomix: An epic search for truth Rating: 4 out of 5 stars4/5Mental Math: Tricks To Become A Human Calculator Rating: 5 out of 5 stars5/5Think Like A Maths Genius: The Art of Calculating in Your Head Rating: 0 out of 5 stars0 ratingsMental Math Secrets - How To Be a Human Calculator Rating: 5 out of 5 stars5/5Mathematics for the Nonmathematician Rating: 4 out of 5 stars4/5Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models Rating: 5 out of 5 stars5/5Learn Game Theory: Strategic Thinking Skills, #1 Rating: 5 out of 5 stars5/5The Shape of a Life: One Mathematician's Search for the Universe's Hidden Geometry Rating: 3 out of 5 stars3/5Calculus for the Practical Man Rating: 4 out of 5 stars4/5
Reviews for Essential Statistics
0 ratings0 reviews
Book preview
Essential Statistics - Sheeny Behmard
Chapter 1
Introduction to Statistics
You are probably thinking, When and where will I use statistics?
Well, you already use statistics in your everyday life. You see statistical information if you read any news, watch television, or use the internet. There are statistics about crime, sports, education, politics, real estate, the stock market, health care, banking, and more. When you read or hear a statistic, you are given sample information about a larger topic. This information lets you decide if the statement, claim, or so-called fact is correct. Statistics helps make complicated issues understandable by connecting them to your everyday life.
Since you already interact with statistics every day, you should be able to analyze the information thoughtfully. Beyond that, economics, business, psychology, education, biology, law, computer science, political science, and early childhood development require at least one course in statistics. Understanding the science of statistics helps deepen your knowledge of your chosen profession. It also helps you manage a monthly budget, make health care decisions, or even decide which smartphone to buy.
This chapter includes the basic ideas and key terms of statistics. You will soon understand that statistics and probability work together. You will also learn how data are gathered and how to determine whether data are reliable, accessible, accurate, consistent, and complete.
After reading this chapter, you will be able to do the following:
Describe the difference between parameters and statistics
Identify quantitative and qualitative data
Describe the four levels of measurement
Describe sampling methods and sampling bias
Understand the difference between sampling with replacement and sampling without replacement
Distinguish a sampling error from a nonsampling error
Construct a frequency table
Describe the difference between relative frequency and cumulative relative frequency
Understand the important components of experimental design
Understand the ethical responsibilities of statistics
1.1 Basics of Statistics
Overview
The science of statistics deals with collecting, analyzing, interpreting, and presenting information. Data are what we call the information collected for analysis and interpretation. We see and use data every day. You use statistics and data when you bring an umbrella because the forecast calls for rain. You use statistics and data when you make a monthly budget based on potential expenses. Even when considering which career to enter, you use statistics and data.
Like all sciences, statistics has a specialized vocabulary. In this section, we’ll learn the basic key terms and concepts. We’ll learn about the different types of data so that you can determine the best way to collect and interpret it down the road.
Definitions and Key Terms
Maria needs to buy a new phone. Before she goes to the store, she wants to get a sense of how much a standard smartphone costs so she doesn’t end up paying more than she should. There are a lot of phones available, though. How can she find their average value? She knows she can’t find all the prices of all the phones, then add them together and then divide by the number of phones on the market—that’s impossible, and she has better things to do with her time. The best she can do is estimate the average price.
One way for her to find the average price is to select a smaller number of phones. Let’s say she picks 20 phones. Now, she only has to find the prices of 20 phones, add those prices up, and divide the total by 20 to find the average smartphone price today.
In statistics, a population is the entire group of objects you want to study. Maria wants to study the whole population of smartphones available. But because that’s too big of a task, the 20 phones she selects from the population are a sample, a specific group of objects taken from the larger population.
When Maria writes down each phone’s price, she determines its measurement, which is the number given to each element of a population or sample. The average value is a parameter, which is a number that describes some feature of the whole population. For Maria, the parameter describes the average price of all smartphones.
Maria’s list of 20 phone prices is her sample data, which is the factual information collected about her sample. When she calculates the average smartphone price, she calculates a statistic about smartphones. A statistic is a number that describes the sample data. For Maria, the average smartphone price is a statistic that can help her make informed choices when shopping.
Be Careful!
The term statistics
can mean two things. It can mean a number calculated from sample data,
like Maria’s average smartphone price. But it can also mean the broader science of statistics, which includes planning studies and experiments, gathering data, analyzing data, and drawing conclusions.
A variable is a characteristic or measurement that can be determined for each member of a population and may be numerical or categorical. Variables are usually notated by capital letters like X and Y. They may be numbers or words. Numerical variables have values with equal units, such as weight in pounds and time in hours. Categorical variables place the person or thing into a category. If X equals the number of smartphones with cameras, then X is a numerical value. If Y equals smartphone manufacturers, then some values of Y include Apple, Samsung, Google, LG, or Motorola, and Y is a categorical variable. We can do math with values of X (calculate the percentage of smartphones with cameras, for example). However, it makes no sense to do math with values of Y (calculating an average smartphone manufacturer makes no sense).
Example 1
Match these key terms to the elements in the scenario: data, parameter, population, sample, statistic, variable.
We want to know the average amount of money first-year students at Wells Community College spend on school supplies, excluding books. We randomly surveyed 100 first-year students at the college. Three students spent $150, $200, and $225, respectively.
Solution 1
The population is all first-year students attending Wells Community College this term.
The sample could be all students enrolled in one section of a beginning statistics course at Wells Community College (although this sample may not represent the entire population).
The parameter is the average (mean) amount of money spent on school supplies by first-year college students at Wells Community College this term.
The statistic is the average (mean) amount of money spent on school supplies by first-year college students in the sample.
The variable could be the amount of money spent on school supplies by one first-year student. Let X = the amount of money spent on school supplies by one first-year student attending Wells Community College.
The data are the dollar amounts spent by the first-year students. Examples of the data are $150, $200, and $225.
Two final important key terms to understand now are mean and proportion. Mean is the average of the data points in a data set. In statistics, mean
is often used instead of average.
If you took three exams in your math classes and earned scores of 86, 75, and 92, you would calculate your mean score by adding the exam scores together and dividing the answer by 3. Your mean score on the exams is 84.3.
Proportion is a fraction of the entire group with a particular characteristic. In a math class, there are 40 students. Twenty identify as male, 17 as female, and 3 as nonbinary. The proportion of male students 2040, the proportion of female students is 1740, and the proportion of nonbinary students is 340.
Example 2
A study was conducted at a local college to analyze the average cumulative Grade Point Averages (GPAs) of students who graduated last year. Match the key term to the phrase that describes it.
Population
Statistic
Parameter
Sample
Variable
Data
all students who attended the college last year
the cumulative GPA of one student who graduated from the college last year
3.65, 2.80, 1.50, 3.90
a group of students who graduated from the college last year, randomly selected
the average cumulative GPA of students who graduated from the college last year
the average cumulative GPA of students in the study who graduated from the college last year
Solution 2
a
f
e
d
b
c
Try It 1.1
Match these key terms to the elements in the scenario: data, parameter, population, sample, statistic, variable.
We want to know the average (mean) amount of money spent on school uniforms annually by families with children at Knoll Academy. We randomly survey 100 families with children in the school. Three families spent $65, $75, and $95, respectively.
Population Parameters and Sample Statistics
The difference between a parameter and a statistic is a key concept in statistics, so let’s explore it more. Both parameters and statistics are numerical measurements, but they describe different things. A parameter describes a population, and a statistic describes a sample. A sample statistic is often used to predict or infer a population parameter. In other words, we might use a sample mean to guess an unknown population mean.
To show this, look at Figure 1.1. The icons in the large box represent parts of the smartphone population available to Maria. There is room for a small number of icons in the figure, but hundreds of smartphones are on the market. The blue icons represent the smartphones that Maria selected to be her sample. Each phone in her sample has a price, such as x1, x2, x3, and so on. All the phones’ prices make up the sample data set when collected. From there, we can calculate statistics that describe the sample—sample statistics.
Figure 1.1. Three columns reading Population, Sample, and Data. Population has 9 rows of smartphones, most are grey and 12 are blue. Sample has four rows of three blue smartphones. Data has an equation X1, X2, ... XnFigure 1.1. The Relationship Between Populations, Samples, Data, Statistics, and Parameters
Population parameters are estimated by converting sample statistics into intelligent guesses about the population. One of the main concerns in statistics is how accurately a statistic estimates a parameter. The accuracy depends on how well the sample represents the population. To be a representative sample, the sample must contain the characteristics of the population.
Example 3
Match these key terms to the elements in the scenario: data, parameter, population, sample, statistic, variable.
As part of a study designed to test the safety of automobiles, the National Transportation Safety Board collected and reviewed data about the effects of an automobile crash on test dummies. Here is the criterion they used:
Speed at which cars crashed: 35 miles/hour
Location of driver
(i.e., dummies): front seat
Cars with dummies in the front seats crashed into a wall at a speed of 35 miles per hour. We want to know the proportion of dummies in the driver’s seat that would have had head injuries if they had been actual drivers. We start with a simple random sample of 75 cars.
Solution 3
The population is all cars containing dummies in the front seat.
The sample is the 75 cars, selected by a simple random sample.
The parameter is the proportion of driver dummies (if they had been real people) who would have suffered head injuries in the population.
The statistic is the proportion of driver dummies (if they had been real people) who would have suffered head injuries in the sample.
The variable X is the number of driver dummies (if they had been real people) who would have suffered head injuries.
The data are either: yes, had a head injury, or no, did not.
Example 4
Match these key terms to the elements in the scenario: data, parameter, population, sample, statistic, variable.
An insurance company would like to determine the proportion of all medical doctors involved in one or more malpractice lawsuits. The company selects 500 doctors at random from a professional directory and determines the number in the sample who have been involved in a malpractice lawsuit.
Solution 4
The population is all medical doctors listed in the professional directory.
The parameter is the proportion of medical doctors in the population who have been involved in one or more malpractice suits.
The sample is the 500 doctors selected at random from the professional directory.
The statistic is the proportion of medical doctors involved in one or more malpractice suits in the sample.
The variable X is the number of medical doctors involved in one or more malpractice suits.
The data are either: yes, was involved in one or more malpractice lawsuits, or no, was not.
Types of Data
Qualitative data, often called categorical data, result from categorizing or describing attributes of a population. Hair color, blood type, ethnic group, the car a person drives, and the street a person lives on are examples of qualitative data. Qualitative data are generally described by words or letters. For instance, Omar needs to buy a new car, but he’s not sure what type he wants, so he starts to research the types of vehicles: compact, subcompact, sedan, SUV, or truck. He’s working with qualitative data, which are names and labels rather than numbers.
Quantitative data are numbers that represent measurements. Quantitative data are the result of counting or measuring attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who take statistics are examples of quantitative data. Researchers often prefer to use quantitative data because it’s easier to analyze mathematically. For instance, it doesn’t make sense to find an average car type, but it does make sense to find an average price.
Note
We may collect data as numbers and report it categorically. For example, an instructor records quiz scores for each student throughout the term. At the end of the term, the quiz scores are reported as A, B, C, D, or F.
Example 5
What are the values of the data below? Are they quantitative or qualitative data?
The data are the colors of backpacks. We sample 5 students. One student has a red backpack, 2 have black backpacks, 1 has a green backpack, and 1 has a gray backpack.
Solution 5
The values are red, black, black, green, and gray. They are qualitative data.
Quantitative data may be either discrete or continuous. Discrete data are the result of counting. These data take on only specific numerical values, and the number of values is finite. For instance, we can count the number of phone calls received each day of the week, and the discrete data might have values such as 1, 0, 1, 2, and 3.
Continuous data consist of counting numbers and fractions, decimals, or irrational numbers. Continuous data are often the results of measurements such as lengths, weights, or times, and in theory, there are endless possibilities of values. For example, a list of the length of minutes for all the phone calls made in a week would be continuous data, with numbers like 2.4 minutes or 11 minutes. Another set of continuous data would be the weight in pounds of backpacks with books: 2.3, 7, 3.8, 4.9, 2.9. They are continuous data because there are endless combinations of possible values of backpacks and books.
Example 6
What are the values of the data below? Are the data discrete or continuous?
The data are the number of books students carry in their backpacks. We sample 5 students. Two students carry 3 books, 1 student carries 4 books, 1 student carries 2 books, and 1 student carries 1 book.
Solution 6
The data values are 3, 4, 2, and 1. They are discrete data.
Example 7
What are the values of the data below? Are the data discrete or continuous?
The data are the weights (in pounds) of backpacks with books. We sample the same 5 students. The weights (in pounds) of their backpacks are 6.2, 7, 6.8, 9.1, and 4.3. Notice that backpacks carrying 3 books can have different weights.
Solution 7
The values are 6.2, 7, 6.8, 9.1, 4.3. They are continuous data.
Example 8
We go to the supermarket and purchase 3 cans of soup (19-ounce tomato bisque, 14.1-ounce lentil, and 19-ounce Italian wedding), 2 packages of nuts (walnuts and peanuts), 4 kinds of vegetables (broccoli, cauliflower, spinach, and carrots), and 2 containers of ice cream (16-ounce pistachio ice cream and 32-ounce chocolate chip cookie dough).
Name data sets that are discrete, continuous, and qualitative.
Solution 8
The 3 cans of soup, 2 packages of nuts, 4 kinds of vegetables, and 2 ice creams are discrete data because we count them.
The weights of the soups (19 ounces, 14.1 ounces, 19 ounces) are continuous data because we measure weights as precisely as possible.
Types of soups, nuts, vegetables, and ice creams are qualitative data because they are categorical.
Bonus: Try to identify additional data sets in this example.
Example 9
First, determine whether the data is qualitative or quantitative. Then, indicate whether quantitative data are continuous or discrete.
the number of pairs of shoes you own
the type of car you drive
the distance it is from your home to the nearest grocery store
the number of classes you take per school year.
the type of calculator you use
weights of sumo wrestlers
number of correct answers on a quiz
IQ scores
Solution 9
Qualitative: 2 and 5
Quantitative discrete: 1, 4, and 7
Quantitative continuous: 3, 6, and 8
Hint
Data that are discrete often start with the words the number of.
Levels of Measurement
Another way to categorize data is by its level of measurement. The level of measurement describes the type of data in a similar way that data can be continuous or discrete. The level of measurement is important because only some statistical operations can be used with some kinds of data sets. To use the correct statistical operation on a data set, we must know its level of measurement.
Data can be classified into four levels of measurement, ordered from least to most specific, as seen in Figure 1.2. We can also think about the measurement level in increasing detail levels. The higher we are in the levels of the measurement pyramid, the more detail we know about the data.
Figure 1.2. A blue pyramid with the top labeled ratio level, the second is interval level, the third is ordinal level and the fourth at the bottom is nominal level.Figure 1.2. The Levels of Measurement
The nominal level of measurement refers to data that is named by categories, names, labels, and yes-or-no responses. Nominal-level data can’t be put in any order. For example, classifying people according to their favorite food does not make sense. Putting pizza first and sushi second is not meaningful. Nominal-level data can’t be used in calculations.
The ordinal level of measurement refers to data that can be ordered, but the data values are impossible to determine or are meaningless. An example of ordinal-level data is a list of the top 5 national parks in the United States. The parks can be ranked from 1 to 5, but we can’t measure differences between the data. Like the nominal-level data, ordinal-level data can’t be used in calculations.
The interval level of measurement describes data that can be ordered and has a measurable value difference, but the data values don’t have a starting point. An example of interval-level data is body temperatures. If we take the temperatures of two people and get readings of 98.6º F and 99.2º F, we can measure the difference of 0.6º F. But the data doesn’t have a real starting point to measure against—we could say 0º F, but we can’t compare body temperatures to 0º F because then the people would be icicles. Any starting point would be meaningless. For this reason, interval-level data can’t be used to calculate ratios.
The ratio level of measurment describes data that can be ordered and measured and has a zero point from which ratios can be calculated. The ratio level provides the most information about data and can lead to a more detailed data analysis. Test scores are a great example of ratio-level data because they can be ordered from lowest to highest. The differences between the data have meaning, and ratios can be calculated. The score of 92 is more than 68 by 24 points. The smallest possible score is 0, and 80 is 4 times more than 20.
Exercises 1.1
For exercises 1–7, describe the following elements for each exercise and give examples:
the population
the sample
the parameter
the statistic
the variable
the data
A fitness center is interested in the mean amount of time a client exercises in the center each week.
Ski resorts are interested in the mean age at which children take their first ski and snowboard lessons. They need this information to plan their ski classes optimally.
A cardiologist is interested in the mean recovery period of her patients who have had heart attacks.
Insurance companies are interested in the mean health costs of their clients each year so that they can determine the costs of health insurance.
A marriage counselor is interested in the proportion of clients she counsels who stay married.
Political pollsters may be interested in the proportion of people who will vote for a particular cause.
A marketing company is interested in the proportion of people who will buy a particular product.
For exercises 8–10, use the following information to answer: A Chemeketa Community College instructor is interested in the mean number of days Chemeketa Community College math students are absent from class during a quarter.
What is the population she is interested in?
all Chemeketa Community College students
all Chemeketa Community College English students
all Chemeketa Community College students in her classes
all Chemeketa Community College math students
Consider the following: X = number of days a Chemeketa Community College math student is absent. In this case, what is X an example of?
variable
population
statistic
data
The instructor’s sample produces a mean number of days absent of 3.5 days. This value is an example of a:
parameter
data
statistic
variable
For the following items, identify the type of data that would be used to describe a response (quantitative discrete, quantitative continuous, or qualitative), and give an example of the data.
number of tickets sold to a concert
percent of body fat
favorite baseball team
time in line to buy groceries
number of students enrolled at Evergreen Valley College
most-watched television show
brand of toothpaste
distance to the closest movie theater
age of executives in Fortune 500 companies
number of competing computer spreadsheet software packages
For exercises 12 and 13, use the following information to answer: A study was done to measure different variables of resident use of a local park in San Jose, California. Researchers wanted to determine the age of park users, the number of times the park is used per week, and the duration (amount of time) of use. The first house in the neighborhood around the park was selected randomly, and then every 8th house in the neighborhood around the park was interviewed.
The phrase number of times per week
is what type of data?
qualitative (categorical)
quantitative discrete
quantitative continuous
The phrase duration (amount of time)
is what type of data?
qualitative (categorical)
quantitative discrete
quantitative continuous
1.2 Data Basics
Overview
As we’ve learned, data may come from a population or a sample. The type of data impacts how it’s gathered. An important part of the science of statistics is accurately sampling a population. When planning an experiment, sampling methods will determine how well the sample data describes the population. In this section, we’ll explore the different sampling methods and when to use them.
Understanding data is about more than just collecting and looking at numbers. A big part of statistics involves displaying data in a way an audience will understand. This section will introduce charts, graphs, and tables that can make data clear and easy to interpret.
Sampling Data
Gathering information about an entire population often costs too much or is virtually impossible. Instead, we use a population sample to find out information about it. Most statisticians use various methods of random sampling to achieve this goal. This section describes some of the most common methods for sampling data.
Random sampling is a technique in which each member of the population has an equal chance of being selected. There are several methods of random sampling, each with pros and cons. In a simple random sample, any group of n individuals is equally likely to be chosen as any other group of n individuals. In other words, each sample of the same size has an equal chance of being selected. Simple random sampling is often used when the population is relatively small and homogeneous, and the variable of interest is unrelated to any subgroups. Since each member of the population has an equal chance of being selected for the sample, simple random sampling can provide a representative sample that can be generalized to the entire population.
There are more specific methods to select a random sample. Table 1.1 shows the methods defined here, their main properties, and situations where each could be used. A stratified sample is a random sample taken from subgroups of the population called strata. Divide the population into strata, then select a proportionate number from each stratum with the simple random sample technique. A cluster sample is formed when the population is divided into clusters (groups), then clusters are randomly selected. All the members from the selected clusters are in the cluster sample. A systematic sample is taken from a listing of the population. A starting point is randomly selected and then every nth piece of data is taken from that list.
A type of sampling that is non-random is convenience sampling. Convenience sampling involves using readily available results. For example, a grocery store conducts a marketing study by interviewing customers who are shopping there. The marketing survey results could be totally different depending on the day of the week, time of day, weather, or any other uncontrollable factor. Convenience sampling can be accurate in some cases and not in others. It should be used with caution. In practice, it is always best for the person conducting the survey or study to select the sample rather than relying on chance.
Example 11
A study is done to determine the average tuition that San Jose State undergraduate students pay per semester. Students in the following samples are asked how much tuition they paid for the Fall semester.
Match the scenario to the type of sampling used.
A sample of 100 undergraduate San Jose State students is taken by organizing the students’ names by classification (freshman, sophomore, junior, or senior) and then selecting 25 students from each.
A random number generator is used to select a student from the alphabetical listing of all undergraduate students in the Fall semester. Starting with that student, every 50th student is chosen until 75 students are included in the sample.
A completely random method is used to select 75 students. Each undergraduate student in the fall semester has the same probability of being chosen at any stage of the sampling process.
The freshman, sophomore, junior, and senior years are numbered one, two, three, and four, respectively. A random number generator is used to pick two of those years. All students in those two years are in the sample.
An administrative assistant is asked to stand in front of the library on Wednesday and to ask the first 100 undergraduate students he encounters what they paid for tuition in the Fall semester. Those 100 students are the sample.
systematic
cluster
simple random
stratified
convenience
Solution 11
d
a
c
b
e
Example 12
Determine which type of sampling is used in each scenario. (simple random, stratified, cluster, systematic, convenience).
A soccer coach selects six players from a group of boys aged eight to ten, seven from a group of boys aged 11 to 12, and three from a group of boys aged 13 to 14 to form a recreational soccer team.
A pollster interviews all human resource personnel in five different high-tech companies.
An educational researcher interviewed 50 female and 50 male high school teachers.
A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.
A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.
A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns on average.
Solution 12
Stratified
Cluster
Stratified
Systematic
Simple random
Convenience
Try It 1.2
Determine the sampling used (simple random, stratified, systematic, cluster, or convenience).
A high school principal polls 50 first-year students, 50 sophomores, 50 juniors, and 50 seniors regarding policy changes for after-school activities.
Sampling with or without Replacement
Actual random sampling is done with replacement. Replacement means that once a member is picked, that member goes back into the population and may be chosen again. However, for practical reasons, simple random sampling is done without replacement in most populations. That is, a member of the population may be chosen only once. Most samples are taken from large populations and tend to be small compared to the population. Since this is the case, sampling without replacement is approximately the same as sampling with replacement because the chance of picking the same individual more than once is very low.
Let’s look at a college population of 10,000 students to see the difference between sampling with or without replacement. A random sample of 1,000 students is selected. Table 1.2 shows the impact of sampling with or without replacement on the chances of affecting the survey outcome for any sample of 1,000 students.
Compare the chance of picking a different second person: 99910,000=0.0999 and 9999,999=0.0999. Even if the decimal is carried four places for accuracy, these numbers are equivalent.
Sampling without replacement becomes a mathematical issue only when the population is small. For example, if the population is 25 students and we want a sample of 10 students, the chances of picking a different second student change. Table 1.3 shows the impact of sampling with or without replacement on the chances of affecting the survey outcome for any sample of 10 students.
Compare the chances of picking a different second person: 925=0.3600 and 924=0.3750. To four decimal places, these numbers are not equivalent and show that whether we sample with or without replacement when the sample size is a larger proportion of the population can influence the survey results.
Example 13
A large community college has 10,000 part-time students (the population). We are interested in the average amount of money a part-time student spends on books in the fall term. Asking all 10,000 students is an almost impossible task. So, we take two different samples.
First, we use convenience sampling and survey 10 students from a first-term organic chemistry class. Many of these students are taking first-term calculus in addition to the organic chemistry class. The amount of money (in dollars) they spend on books is:
1288717311613020414718993153
The second sample is taken using a list of adults over 65 who take physical education classes. We take every fifth person on the list, for a total of 10 adults over 65. They spend the following amounts:
504036155010040532222
It is