Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
OBJECTIVES:
After successful completion of this module, you should be Data Presentation
able to:
✦ Distinguish the three main forms of data presentation. Data are usually collected in a raw format and thus
✦ Know the different parts of the table. the inherent information is difficult to understand.
✦ Choose appropriate diagrams/graphs to present a given set of Therefore, raw data need to be summarized,
data.
✦ Organize qualitative and quantitative data in tables.
processed, and analyzed to usefully derive
information from them. However, no matter how well
✦ Compute measures of central tendency, measures of variation and
measures of relative position of grouped and ungrouped data.
manipulated, the information derived from the raw
✦ Describe the shape of a distribution.
data should be presented in an effective format,
otherwise, it would be a great loss for both authors
✦ Identify regions under the normal curve corresponding to
and readers. Planning how the data will be presented
different standard normal values.
✦ Compute probabilities using the standard normal table and Excel.
is essential before appropriately processing raw data.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Advantage of Tabular
Tabular Presentation: Presentation
• It is a systematic and logical arrangement of ✦ More information may be presented.
data in the form of Rows and Columns with
respect to the characteristics of data.
✦
Exact values can be read from a table to
retain precision.
• A table is best suited for representing individual
information and represents both quantitative
✦ Flexibility is maintained without
and qualitative information. distortion of data.
✦ Less work and less cost are required in
the preparation.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Preparing Tables B. Boxhead: The boxhead contains the captions or
The making of a compact table itself is an art. This should column headings. The heading of each column
contain all the information needed within the smallest possible
should contain as few words as possible, yet
space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind explain exactly what the data in the columns
while preparing for a statistical table. An ideal table should represent.
consist of the following main parts:.
A. Title: The title must tell as simply as possible what is in the C. Stubs: The row captions are known as the stub.
table. It should answer the questions: Items in the stub should be grouped to facilitate
✦ Who? White females with breast cancer, black males with interpretation of the data. For example, rows may
lung cancer. stand for score of classes and columns for data
✦ What are the data? Counts, percentage distributions, rates. related to sex of students. In the process, there will
✦ Where are the data from? Example: One hospital, or the be many rows for scores classes but only two
entire population covered by your registry. columns for male and female students.
✦
When? A particular year, time period.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Distribution Table?
A frequency distribution list each
category of data and the number of
occurrences for each category of data.
Solution:
To answer this question we need to construct a frequency
distribution to determine how many female and male
respondents participated in the study.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦ If the data is in the form of qualitative data
To construct the frequency distribution using
excel use the command:
=frequency(data_array,bins_array)
Then Ctrl → Shift → Enter
{=frequency(data_array,bins_array)}
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
graphic. Let the data speak for themselves. Ungrouped data without a 0
1
7
15
frequency distribution 2 12
3 4
1, 5, 4, 7, 2, 4, 1, 3, 8, 2, 2, 9 4 5
Polytechnic University of the Philippines
College of Science
Polytechnic University of the Philippines 5 2
College of Science
Department of Mathematics and Statistics
Department of Mathematics and Statistics Total 45
• It is the sum of the data values divided by the number of where: where:
∑i=1 fxi
data values.
∑i=1 xi xi = data values
xi = data values n r
• It is also called the average. n = no. of
x̄ = f = frequency x̄ =
• It is appropriate only for data under interval and ratio scale sample n n = no. of n
measurement. observations sample
observations
Advantage of Mean Population Mean
✦ Simple to understand and easy to calculate. where:
∑i=1 xi xi = data values ∑i=1 fxi
N where: r
✦ It is rigidly defined. xi = data values
✦ It is least affected fluctuation of sampling. N = no. of μ= f = frequency
μ=
observations N N
✦ It takes into account all the values in the series. N = no. of
observations
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Measures of Central Tendency: Formula for Median:
MEDIAN ✦ For Ungrouped Data ✦ For Grouped Data
It is the “middle observation” when the data set is sorted (in
(2 )
•
1. Arrange the data from n
either increasing or decreasing order). − < cf i
lowest to highest (or highest
• The median divides the distribution into two equal parts. x̃ = LB +
to lowest). f
Advantage of Median where:
✦ The median is not affected by the size of extreme values but 2. For an odd number of LB = lower boundary of the
by the number of observations. data, the median of a data median class
✦ The median can be calculated even when the frequency set is the “middle i = class width
distribution contains “open-ended” intervals. observation”. When the n = no. of observations
✦ It can also be used to define the middle of a number of
number of data is even, the < cf = less than the cumulative
median is the “average of frequency of the class
objects, properties, or quantities which are not really
quantitative in a nature. the two middle scores”. preceding the median class
f = frequency of the median
✦ It can be easily interpreted.
Polytechnic University of the Philippines Polytechnic University of the Philippines
class
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
In both data sets, the median is 116, as it is the number that • The mode is simply the most frequently occurring data
divides the data set into two exact halves. However, you will values in the data set. Therefore, it is mainly useful for the
notice that the mean is not identical in both data sets. For the nominal level of measurement. Both median and mean are
first data set, the mean is equal to 116 where the mean of the useful when the variable being measured can be quantified.
second data set is equal to 132.5 Also both data sets have no mode that’s why mode is not
appropriate measure to use in these data sets.
Notice how the mean of the second data set has been
influenced by the presence of an unusual case/outlier in the
data set. If we were to say the mean is equal to 132.5 for the
• It is better to use the median than to use the mean when
the sample is small or asymmetrical (i.e., skewed) and
second data set and it represents a typical case, this will not
make much sense because the majority of data values are less
unusual cases/outliers is present in the data sets. This is
than 120. Therefore, the mean should not be used when
why the average housing price is always reported with the
unusual, or outlying, data values are present in the data set, as median, since even one million-dollar house can distort the
the mean tends to be extremely sensitive to the unusual average housing price when most of the houses are in
values. Rather, the median should be reported in this case. Php500,000–Php650,000 range.
55 - 59 3
It is the midpoint of
Class Interval Frequency
50 - 54 6 every class interval.
55 - 59 55 45 - 49 7
To compute this:
LC + UP
50 - 54 23 40 - 44 9
x=
45 - 49 37 35 - 39 6
40 - 44 37 30 - 34 4
2
35 - 39 48 25 - 29 5 Ex:
7 55 + 59
30 - 34 42
fxi = x= = 57
25 - 29 27
Total n=
∑ 2
50 + 54
i=1
x= = 52
Polytechnic University of the Philippines Polytechnic University of the Philippines
2
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Solution: Solution:
7 To compute median and mode of grouped data, first
Class Interval Frequency x fx ∑i=1 fxi
x̄ =
(f) you need to fill out this table.
55 - 59 3 57 171
50 - 54 6 52 312 n Class
f LB < cf
Interval To compute the lower
1,675
45 - 49 7 47 329 55 - 59 3
b o u n d a r y, a l w a y s
=
40 - 44 9 42 378 50 - 54 6
subtract 0.5 to lower
40
35 - 39 6 37 222 45 - 49 7
30 - 34 4 32 128 40 - 44 9 class limit (LC).
= 41.88
25 - 29 5 27 135
7
35 - 39 6 Ex:
55 − 0.5 = 54.5
30 - 34 4
fxi = 1,675
Total n = 40 ∑
50 − 0.5 = 49.5
25 - 29 5
i=1
Total n=
(2 )
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40 n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9
Solution:
Class
Interval f LB < cf The modal class is the class interval
Measures of Relative Position
55 - 59 3 54.5 40 with the highest frequency. The
modal class is 40 - 44.
50 - 54 6 49.5 37 Quantiles are statistics that describe
45 - 49
40 - 44
7
9
44.5
39.5
31
24 If there are two class interval that various subdivisions of a frequency
35 - 39 6 34.5 15
contains the highest frequency, distribution into equal proportions.
always choose the highest class
30 - 34 4 29.5 9
25 - 29 5 24.5 5
interval. Three special Quantiles:
d1 = 9 − 6 = 3 1. Quartiles
( d1 + d2 )
d1
x ̂ = LB + i
d2 = 9 − 7 = 2 2. Deciles
3
(3 + 2)
x ̂ = 39.5 + 5 = 42.5
3. Percentiles
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Formula for Quartile:
Quartiles - split
the ordered data ✦ For Ungrouped Data ✦ For Grouped Data
into four quarters.
(4 )
nk
1. Arrange the data from − < cf i
lowest to highest. Then use
Qk = LB +
this formula. f
Deciles - split the nk
Qclass = + 0.5
where:
ordered data into
ten equal. 4 LB = lower boundary of the
quartile class
2. If the resulting positioning i = class width
point is an integer, the
n = no. of observations
particular numerical k = quartile position
Percentiles - split
observation corresponding
the ordered data < cf = less than the cumulative
to that point is chosen for frequency of the class
into 100 equal
parts.
the quartile. If not, use preceding the quartile class
interpolation. f = frequency of the quartile
class
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
( 100 )
nk
( 10 )
1. Arrange the data from 1. Arrange the data from
nk
− < cf i lowest to highest. Then use − < cf i
lowest to highest. Then use
this formula. Dk = LB + this formula. Pk = LB +
f f
nk
Dclass =
nk
+ 0.5 Pclass = + 0.5 where:
10
where: 100 LB = lower boundary of the
LB = lower boundary of the
2. If the resulting 2. If the resulting percentile class
decile class
i = class width positioning point is an i = class width
positioning point is an
n = no. of observations n = no. of observations
integer, the particular integer, the particular
k = decile position k = percentile position
numerical observation numerical observation
< cf = less than the cumulative < cf = less than the cumulative
corresponding to that point corresponding to that point frequency of the class
is chosen for the decile.If frequency of the class is chosen for the percentile.
preceding the decile class preceding the percentile class
not, use interpolation. If not, use interpolation. f = frequency of the percentile
f = frequency of the decile class
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
class
Example 1: Solution: To compute Q3 of ungrouped data:
The data given below is the total number of hours 1. Arrange the data from lowest to highest.
lost due to tardiness and absences of employees in a 20 23 24 27 30 32 37 37 40 42 48 55
company in a given year. 1 2 3 4 5 6 7 8 9 10 11 12
(12)(3)
Qclass = = 9.5
Month Hour Lost (x)
Find Q3, D4 and P55. January
February
55
23
4
March 37
2. Use interpolation since the computed Qclass is not an integer.
April 37
May 48 20 23 24 27 30 32 37 37 40 42 48 55
June 42 1 2 3 4 5 6 7 8 9 10 11 12
Q3 = 40 + 0.5(42 − 40)
July 27
August 20
= 41
September 30
October 32
November 24
December 40
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
1. Arrange the data from lowest to highest. 1. Arrange the data from lowest to highest.
20 23 24 27 30 32 37 37 40 42 48 55 20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
(12)(4) (12)(55)
Dclass = + 0.5 = 5.3 Pclass = + 0.5 = 7.1
10 100
2. Use interpolation since the computed Dclass is not an integer. 2. Use interpolation since the computed Pclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55 20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
(4 )
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40 nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6
( 10 ) ( 100 )
nk
− < cf i
nk
− < cf i (5 − 0)5
(28 − 24)5 P10 = 24.5 + = 29.5
Dk = LB + D7 = 44.5 + = 47.36 Pk = LB + 5
f 7 f
Example 2: Solution:
The ages of the town’s people in a certain community To compute Q2, D5, and P50 of grouped data, first you
is as follows: need to fill out this table.
Class
f LB < cf
Class Interval Frequency Interval To compute the lower
18 - 24 28 18 - 24 28 b o u n d a r y, a l w a y s
25 - 31 54
25 - 31 54 subtract 0.5 to lower
32 - 38 38
32 - 38 38 class limit (LC).
39 - 45 20
39 - 45 20 Ex:
18 − 0.5 = 17.5
46 - 52 17
46 - 52 17
53 - 59 3
53 - 59 3
Total n= 25 − 0.5 = 24.5
Find Q2, D5, and P50. 32 − 0.5 = 31.5
(4 )
28 + 54 = 82 + 38 = 120 + 20 = 140 + 17 = 157 + 3 = 160 nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54
Solution: Solution:
nk
nk First, compute , it will help us to
Class First, compute , it will help us to Class
100
10
Interval f LB < cf Interval f LB < cf
determine the percentile class and
18 - 24 28 17.5 28 determine the decile class and the 18 - 24 28 17.5 28
the
< cf. (160)(5) (160)(50)
25 - 31 54 24.5 82 25 - 31 54 24.5 82
nk < cf. nk
= = 80 = = 80
10 10 100 100
32 - 38 38 31.5 120 32 - 38 38 31.5 120
39 - 45 20 38.5 140 39 - 45 20 38.5 140
46 - 52 17 45.5 157 The decile class is the class 46 - 52 17 45.5 157 The percentile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the 53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 decile class is 25 - 31. Total n = 160 percentile class is 25 - 31.
( 10 ) ( 100 )
nk nk
− < cf i − < cf i (80 − 28)7
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24 Pk = LB + P50 = 24.5 + = 31.24
f 54 f 54
R = Xmax. − Xmin.
• The larger the standard deviation, the more variation there
is in the data set.
Range is simple to calculate. However, we should be • The standard deviation can never be a negative number,
cautious about using range as a measure of variability. due to the way it’s calculated and the fact that it measures a
distance (distances are never negative numbers).
Range is a very crude measure of variability as it only
uses the highest and lowest values in computation. • The smallest possible value for the standard deviation is 0,
Therefore, it does not accurately capture information and that happens only in contrived situations where every
about how data values in the set differ if the data set single number in the data set is exactly the same (no
deviation).
contains an unusual cases/outliers.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Formula for Standard Deviation: Measures of Dispersion/Variability:
Sample Standard Deviation
✦ For Ungrouped Data ✦ For Grouped Data
VARIANCE
where: where: It represents all data points in a set and is calculated
∑i=1 (xi − x̄) xi = data
2 ∑i=1 f(xi − x̄)2
n r
xi = data
values s = values s =
by averaging the squared deviation of each mean.
n−1 n−1
x̄ = mean x̄ = mean
n = no. of sample observations f = frequency Variance is not easy to read as it is the squared format
n = no. of sample observations and hence not easily interpretable. However,
Population Standard Deviation Standard deviation being in the same units as the
where: mean we can easily understand the spread of data.
where:
xi = data
∑i=1 (xi − μ) 2 xi = data ∑i=1 f(xi − μ)2
N r
values σ = values σ =
μ = mean N μ = mean N
N = no. of observations f = frequency
College of Science
N = no. of observations
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Solution: Solution: 7
∑i=1 f(xi − x̄)2
Class
(xi − x̄)2 f(xi − x̄)2 s=
f x fx
n−1
(xi − x̄) 2
f(xi − x̄) 2
Interval
Class
55 - 59 3 57 171 228.61 685.83
3,124.20
Interval
s=
50 - 54 6 52 312 102.41 614.46 55 - 59 228.61 685.83
45 - 49 7 47 329 26.21 183.47 50 - 54 102.41 614.46 40 − 1
40 - 44 9 42 378 0.01 0.09 45 - 49 26.21 183.47 = 8.95
35 - 39 6 37 222 23.81 142.86 40 - 44 0.01 0.09
30 - 34 4 32 128 97.61 390.44
7
∑i=1 f(xi − x̄)2
35 - 39 23.81 142.86
s =
25 - 29 5 27 135 221.41 1107.05 30 - 34 97.61 390.44 2
7 7
fx = f(x − x̄)2 =
25 - 29 221.41 1107.05
n−1
Total n = 40 ∑ i ∑ i 7
1,675 3,124.20 f(xi − x̄)2 = 3,124.20
i=1
∑
i=1
s2 =
Total
3,124.20
f(x1 − x̄) = 3(228.61) = 685.83
2 i=1
40 − 1
f(x2 − x̄)2 = 6(102.41) = 614.46 = 80.11
f(x3 − x̄)2 = 7(26.21) = 183.47
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
How to interpret variance and standard We cannot use variance as a measure of variability. Let us
assume that the values represent weight losses measured in
deviation? pounds taken from five subjects. Because the deviation of each
observation from the mean has been squared, the unit for the
Consider the following data set of toddler variance is now in (pound)2 . What does (pound)2 mean? If we
weights in an outpatient clinic, assuming that the were to say that data values differ from the mean on average
data values were taken: about 9.7 (pound)2, would this claim make sense? Probably not,
since there is no such a unit as a (pound)2.
Data Set 15 13 20 19 14
Why do we then take the square of the deviation if the (unit)2
will not make sense to interpret at the end? The answer is
Computed variance for this data set is 9.7. simple: If you do not square the deviation and sum each
Computed standard deviation for this data set is deviation, it will always add up to zero no matter what data
set you work with.
3.11. n n
(xi − x̄) = 0 → (xi − x̄)2 ≠ 0
What does this mean? ∑ ∑
i=1 i=1
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
How can we then talk about variability if the measure of Choosing a Measure of Dispersion/Variability:
variability comes out to be equal to zero? This is why we take We have discussed four types of dispersion/variability - the
square of the deviation to compute the variance first and range, the interquartile range, the variance, and the
then take square root of it to compute the standard standard deviation and examined how they differ. The next
deviation, bringing us back to the original unit of legitimate question to ask may be “When do we use which
measurement. measure?”
We get the standard deviation of 3.11 by taking square root of
9.7; we can then say that the data values differ from the mean You should use the range only as a crude measure, since it
(16.2 lbs.) on an average of about 3.11 pounds. We can is extremely sensitive to unusual values in the data set.
interpret this finding to mean that, on average, the weights fall Interquartile range is not as sensitive to unusual data values,
between 13.09 and 19.31 pounds. This makes more sense where standard deviation is very sensitive to unusual values.
when you look at the data set, compared to the variance. Note Therefore, the interquartile range should be used with the
that the mean and standard deviation should always be median when the data contain unusual data values.
reported together! However, the standard deviation should be used with the
16.2 − 3.11 = 13.09
mean when the data are free of unusual data values.
16.2 + 3.11 = 19.31
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Shape of Distribution Skewness
A symmetrical distribution will have a skewness of 0.
These two statistics give you insights into the shape of So, a normal distribution will have a skewness of 0.
the distribution.
In a symmetrical distribution, the Mean, Median and
✦
Skewness is the degree of distortion from the Mode are equal to each other and the ordinate at
symmetrical bell curve or the normal distribution. It mean divides the distribution into two equal parts.
measures the lack of symmetry in data distribution.
✦ Kurtosis is a measure of the combined sizes of the
two tails. It tells you how tall and sharp the central
peak is, relative to a standard bell curve.
and μ + σ.
Remember!
Positive values of z-score indicate how far above
the mean a score falls and negative values
indicate how far below the mean a score falls.
= -
z1 z2 0 z1 0 z2
1 − Area 1 − Area
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Patterns for Finding Areas under a Standard Normal Curve Patterns for Finding Areas under a Standard Normal Curve
= - = +
0 z1 0 z1 0 z1 0 z2 0 z2 z1 0
Area = 0.50 0.50 − Area 0.50 − Area
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS: ACTIVITIES/ASSESSMENTS:
4. Consider the above Frequency Distribution of 5. The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months.
Salaries. Find the probability that an instrument produced by this machine will last
Salary Frequency Percentage
A. less than 7 months.
41,000 - 50,000 1 1%
51,000 - 60,000 20 13% B. between 7 and 12 months.
61,000 - 70,000 53 35% Be sure to draw a normal curve with the area corresponding to the
71,000 - 80,000 43 29% probability shaded.
81,000 - 90,000 26 17% 6. The lengths of human pregnancies are approximately normally distributed,
91,000 - 100,000 6 4% with mean μ = 266 days and standard deviation σ = 16 days.
101,000 - 110,000 1 1% What proportion of pregnancies lasts more than 270 days?
Total 150 100% B. What proportion of pregnancies lasts less than 250 days?
1.What percentage of the employees earns less than or C. What proportion of pregnancies lasts between 240 and 280 days?
equal 80,000? D. What is the probability that a randomly selected pregnancy?
2.What is the salary range of values? lasts more than 280 days?
3.What salary categories have percentage less than 5? Be sure to draw a normal curve with the area corresponding to the
4.What salary category includes the most employees? probability shaded.
Polytechnic University of the Philippines
Polytechnic University of the Philippines
College of Science
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS: ACTIVITIES/ASSESSMENTS:
7. Construct frequency distribution table based on the A. Based on the frequency distribution, compute measures of
scores of 75 randomly selected students. central tendency, measures of variation, Q1, D9, P10 , Skewness
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42 and kurtosis.
B. Based on the raw data, compute measures of central
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48 tendency, measures of variation, Skewness and kurtosis using
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26 Excel.
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50 C. Compute Skewness and kurtosis of grouped and ungrouped
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31 data. Make sure to describe the shape of the distribution
Scores Frequency Percentage (%) D. Do you think that computed value for grouped and
26 to 30 ungrouped data are the same?
31 to 35
36 to 40 8. Begin with the following set of data, call it Data Set I.
41 to 45 5, −2, 6, 14, −3, 0, 1, 4, 3, 2, 5
46 to 50 A. Compute the sample standard deviation and sample mean of
Total Data Set I.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
B. Form a new data set, Data Set II, by adding 3 to each
number in Data Set I. Calculate the sample standard deviation References
and sample mean of Data Set II. https://prezi.com/rirrca9ckuiz/textual-
C. Form a new data set, Data Set III, by subtracting 6 from presentation-of-data/
each number in Data Set I. Calculate the sample standard
deviation and sample mean of Data Set III. https://www.toppr.com/guides/economics/
D. Comparing the answers to parts (a), (b), and (c), can you presentation-of-data/textual-and-tabular-
guess the pattern? State the general principle that you expect presentation-of-data/
to be true.
Statistics. Informed Decision using Data by
9.Using “Encoded Data file”, construct frequency distribution Michael Sullivan, III,. Fifth Edition
table for age, sex, marital status and educational attainment
and interpret the table.