Lectures_1_2_notes
Lectures_1_2_notes
Lectures_1_2_notes
The mean in math and statistics summarizes an entire dataset with a single number
representing the data’s center point or typical value. It is also known as the
arithmetic mean, and it is the most common measure of central tendency. It is
frequently called the “average.”
Learn how to find the mean and know when it is and is not a good statistic to use!
Finding the mean is very simple. Just add all the values and divide by the number of
observations. The mean formula is below:
For example, if the heights of five people are 48, 51, 52, 54, and 56 inches. Here’s
how to find the mean:
48 + 51 + 52 + 54 + 56 / 5 = 52.2
Mean Formula
Where:
Typically, the population mean formula notation uses Greek and uppercase
letters.
Ideally, the mean in math (or the average) indicates the region where most values in
a distribution fall. Statisticians refer to it as the central location of a distribution. You
can think of it as the tendency of data to cluster around a middle value.
The histogram below illustrates the average accurately finding the center of the
data’s distribution.
In these cases, the average can be misleading because it might not be near the most
common values. Consequently, it’s best to use the average to measure the central
tendency when you have a symmetric distribution.
For skewed distributions, it’s often better to use the median or trimmed mean,
which use different methods to find the central location. Note that the average
provides no information about the variability present in a distribution. To evaluate
that characteristic, assess the standard deviation.
The standard deviation uses the original data units, simplifying the interpretation.
For this reason, it is the most widely used measure of variability. Suppose a pizza
restaurant measures its delivery time in minutes and has an SD of 5. In that case, the
interpretation is that the typical delivery occurs 5 minutes before or after the mean
time. Statisticians often report the standard deviation with the mean: 20 minutes
(StDev 5). If another pizza restaurant has a standard deviation of 10 minutes, we
In this note/article learn why the standard deviation is essential, work through an
interpretation example, and learn how to calculate it by hand.
Understanding the standard deviation is crucial. While the mean identifies a central
value in the distribution, it does not indicate how far the data points fall from the
center. Higher SD values signify that more data points are further away from the
When variability is high, you can expect to experience extreme values more
frequently, which can cause problems! If the restaurant meal differs noticeably from
the usual, you might not like it at all. When your morning commute takes much
longer than the average travel time, you will be late. And, manufactured parts that
are too far out then system won’t perform correctly.
Frequently, we feel distressed at the extremes more than the mean. Standard
deviations help you understand the variability and provides vital information about
the consistency of outcomes or lack thereof.
Suppose two pizza restaurants advertise a 20-minute average delivery time. We’re
starving and both look equally good! However, we know the mean does not tell the
entire story!
Let’s assess their standard deviations to choose the restaurant. Imagine we obtain
their delivery time data. One restaurant has a SD of 10 minutes while the other has
a value of 5. How does this affect deliveries?
NOTE: Area under the curve is equal to one always, irrespective of the shape –
normal or square
Calculating the standard deviation involves the following steps. The numbers
correspond to the column numbers.
The calculations take each observation (1), subtract the sample mean (2) to calculate
the difference (3), and square that difference (4).
1
Note: not important to remember the name for us , Just for info.
Calculate the square root of the variance to derive the SD. (Question: why not just
leave at Variance? Why Square root?)
Learn how you can use the range of a dataset to estimate the standard deviation
using the range rule of thumb.
The standard deviation is similar to the mean absolute deviation. Both statistics use
the original data units and they compare the data points to the mean to assess
variability. However, there are differences. To learn more, read about the mean
absolute deviation (MAD).
Measures of central tendency are summary statistics that represent the center point
or typical value of a dataset. Examples of these measures include the mean, median,
and mode. These statistics indicate where most values in a distribution fall and are
also referred to as the central location of a distribution. You can think of central
tendency as the propensity for data points to cluster around a middle value.
In statistics, the mean, median, and mode are the three most common measures of
central tendency. Each one calculates the central point using a different method.
Choosing the best measure of central tendency depends on the type of data you
have. In this post, I explore the mean, median, and mode as measures of central
tendency, show you how to calculate them, and how to determine which one is best
for your data.
Most articles about the mean, median, and mode focus on how you calculate these
measures of central tendency., I’m going to start by illustrating the central point of
several datasets graphically—so you understand the goal. Then, we’ll move on to
choosing the best measure of central tendency for your data and the calculations.
Median
The median is the middle value. It is the value that splits the dataset in half, making
it a natural measure of central tendency.
To find the median, order your data from smallest to largest, and then find the data
point that has an equal number of values above it and below it. The method for
locating the median varies slightly depending on whether your dataset has an even
or odd number of values. I’ll show you how to find the median for both cases. In
the examples below, I use whole numbers for simplicity, but you can have decimal
places.
In the dataset with the odd number of observations, notice how the number 12 has
six values above it and six below it. Therefore, 12 is the median of this dataset.
Outliers and skewed data have a smaller effect on the median than the mean as a
measures of central tendency. To understand why, imagine we have the Median
dataset below and find that the median is 46. However, we discover data entry errors
and need to change four values, which are shaded in the Median Fixed dataset. We’ll
make them all significantly higher so that we now have a skewed distribution with
large outliers.
These data are based on the U.S. household income for 2006. Income is the classic
example of when to use the median instead of the mean because its distribution
tends to be skewed. The median indicates that half of all incomes fall below 27581,
and half are above it. For these data, the mean overestimates where most household
incomes fall.
NOTE : the median is a robust statistic while the mean is sensitive to outliers and
skewed distributions.