Statistics Notes Exam 1
Statistics Notes Exam 1
Statistics Notes Exam 1
Types of variables.
The highest frequency occurs in the middle and frequencies tail off to the
left and right of the middle
The tail to the right of the peak is longer than the tail to the left of the
peak
The tail to the left of the peak is longer than the tail to the right of the
peak
Sample mean= x̅
Mean
Computed by adding all of the values of the variable in the data set and
dividing by the number of observations.
Median
The value that lies in the middle of the data when arranged in ascending
order.
Resistant
A numerical summary of data is said to be resistant if extreme values
relative to the data do not affect its value substantially.
Mode
The most frequent observation of the variable that occurs in the data set.
Range
Denoted R. The difference between the largest and smallest data value.
Bias
Whenever a statistic consistently underestimates a parameter.
Empirical Rule
score
Represents the distance that a data value is from the mean in terms of the
number of standard deviations
Quartiles
Interquartile range
The range of middle 50% of the observations of a data set. That is, the
IQR is the difference between the first and third quartiles and is found
using the formula: IQR = Q3 - Q1.
Five-number summary
Consists of the smallest observation, the first quartile (Q1), the median,
the third quartile (Q3), and the largest observation, written in order from
smallest to largest
Boxplot
the cumulative frequency for the second class is the sum of the
frequencies of classes 1 and 2; the cumulative frequency for the third
class is the sum of the frequencies of classes 1,2,and 3; and so on.
An ogive (read as “oh jive”) is a graph that represents the cumulative
frequency or cumulative relative frequency for the class. It is constructed
by plotting points whose x-coordinates are the upper class limits and
whose y-coordinates are the cumulative frequencies or cumulative
relative frequencies of the class. Then line segments are drawn
connecting consecutive points. An additional line segment is drawn
connecting the first point to the horizontal axis at a location representing
the upper limit of the class that would precede the first class (if it
existed).
key vocabulary
Vocabulary Example
The list of observed values Gender is a variable; the observations
for a variable is data. male and female are data.
Qualitative data are Gender is a qualitative variable; male
observations corresponding to a and female are qualitative data.
qualitative variable.
Quantitative data are Income is a quantitative
observations variable; $32,012 or $57,839 are
corresponding to a quantitative data.
quantitative variable.
Discrete data are Attendance at a play is a discrete
observations variable; 8431 attendees or 2984 attendees
corresponding to a discrete are discrete data.
variable.
Continuous data are Time between calls to a call center is a
observations corresponding to continuous variable; 32seconds
a continuous variable. or 21 seconds (between calls) are
continuous data.
• Wording error: *he way questions are asked can lead to bias in
survey so questions must be asked in balance form.
*Avoid being vague.
• Ordering of questions or words
• Data entry error
Examples:
Nonsampling error is the error that results from the process of obtaining
the data, under-coverage, nonresponse bias, response bias, or data-entry
errors are all types of non-sampling errors. Sampling error is the error
that results because a sample is being used to estimate information about
a population. This type of error occurs because a sample gives
incomplete information about a population.
N.B: Since closed questions limit the possible responses, they are easier
to analyze. Open questions are harder to analyze due to the variety of
answers and the chance of misinterpreting an answer.
Qualitative: bar graphs, pareto chart, horizontal bars, side by side bar
graph, pie chart
The sum of the deviations about the mean always equals zero
True, because the standard deviation describes how far, on average, each
observation is from the typical value. A larger standard deviation means
that observations are more distant from the typical value, and therefore,
more dispersed.