Descriptive Statistics

Download as xlsx, pdf, or txt
Download as xlsx, pdf, or txt
You are on page 1of 41

TOPIC Sub-topic

Weighted Average
Calculating aggregate marks
If Statement
Pass - fail verdicts
Central Tendency
Average & median
Dispersion
Variance, Standard deviation
Quartiles & Interquartile range
Frequency Distributions
Histogram

Descriptive Statistics
Summary Statistics
Ranking Data

Rank Correlation Spearman's rank correlation


coefficient.

Correlation Pearson's correlation coefficient


Link
Weighted averages

Handling data

Central tendency

Dispersion

Frequency distributions

The Excel data analysis Routine

Ranking data

Spearman’s rank correlation coefficient

Pearson’s correlation coefficient


CONTENTS Weighted averages

The data below are the percentage marks obtained in each of two tests
by a group of fifty students.
Test 1 is weighted at 40% and test 2 is weighted at 60%.
These weights have been entered to the A12 and B12 cells and then the cells named
as W_1 and W_2 respectively.
The task is to calculate the aggregate mark for each student in column C.
To see how to do it look at the comments in A12, B12 and C13.

40% 60% PROCEED TO SHEET 2 (Handling data)


TEST 1 TEST 2 AGGREGATE
57 38
50 34
36 69
47 83
97 66
92 88
97 90
4 46
6 52
81 70
24 90
37 6
96 88
70 44
52 11
46 2
93 63
7 83
66 46
94 11
71 50
85 91
95 91
0 56
82 57
12 95
97 55
11 16
82 53
29 19
3 98
91 90
34 86
15 38
54 34
39 56
96 11
42 51
51 27
79 76
65 91
75 80
88 22
98 98
99 48
75 8
23 0
26 58
42 11
81 71
n the cells named
CONTENTS Handling data

Drawing conclusions from the data


With the aggregate marks computed in column C we now want to decide whether each
student has passed or failed. For this we need some rules.
Suppose that in order to pass, neither test 1 nor test 2 can have a mark of less than 35%
and that the aggregate mark must be at least 40%.
These minimum requirements have been entered to A12, B12 and C12 cells and then these
cells named as M_1, M_2 and M_3 respectively.
We will use this to compute the verdict in column D.
Once again study the comments at the top of each column.
35 35 40

40% 60% PROCEED TO SHEET 3 (Central tendency)


TEST 1 TEST 2 AGGREGATE VERDICT
57 38 45.6
50 34 40.4
36 69 55.8
47 83 68.6
97 66 78.4
92 88 89.6
97 90 92.8
4 46 29.2
6 52 33.6
81 70 74.4
24 90 63.6
37 6 18.4
96 88 91.2
70 44 54.4
52 11 27.4
46 2 19.6
93 63 75
7 83 52.6
66 46 54
94 11 44.2
71 50 58.4
85 91 88.6
95 91 92.6
0 56 33.6
82 57 67
12 95 61.8
97 55 71.8
11 16 14
82 53 64.6
29 19 23
3 98 60
91 90 90.4
34 86 65.2
15 38 28.8
54 34 42
39 56 49.2
96 11 45
42 51 47.4
51 27 36.6
79 76 77.2
65 91 80.6
75 80 78
88 22 48.4
98 98 98
99 48 68.4
75 8 34.8
23 0 9.2
26 58 45.2
42 11 23.4
81 71 75
whether each

of less than 35%

cells and then these

EET 3 (Central tendency)


PASS
FAIL
PASS
PASS
PASS
PASS
PASS
FAIL
FAIL
PASS
FAIL
FAIL
PASS
PASS
FAIL
FAIL
PASS
FAIL
PASS
FAIL
PASS
PASS
PASS
FAIL
PASS
FAIL
PASS
FAIL
PASS
FAIL
FAIL
PASS
FAIL
FAIL
FAIL
PASS
FAIL
PASS
FAIL
PASS
PASS
PASS
FAIL
PASS
PASS
FAIL
FAIL
FAIL
FAIL
PASS
CONTENTS Central tendency

Measures of Central Tendency


It would now seem appropriate to calculate some summary statistics for these data.
We will concentrate on central tendency first of all. The most widely known of these is the
The functions and formulae to this are indicated in the comments.
ARITH. MEAN TEST 1
ARITH. MEAN TEST 2
ARITH. MEAN AGGREGATE
35 35 40
Another measure of central tendency is the Median
40% 60% This gives the mark that is equalled or exceeded by 50
TEST 1 TEST 2 AGGREGATE VERDICT
57 38 45.6 PASS MEDIAN TEST 1
50 34 40.4 FAIL MEDIAN TEST 2
36 69 55.8 PASS MEDIAN AGGREGATE
47 83 68.6 PASS
97 66 78.4 PASS
92 88 89.6 PASS Finally we might want to calculate the pass rate
97 90 92.8 PASS for this group of students.
4 46 29.2 FAIL The way to do this is to count all the students who pass
6 52 33.6 FAIL and then divide this by a count of all the students in the
81 70 74.4 PASS Study the comment in F22.
24 90 63.6 FAIL PASS RATE =
37 6 18.4 FAIL
96 88 91.2 PASS
70 44 54.4 PASS PROCEED TO SHEET 4 (Dispersion)
52 11 27.4 FAIL
46 2 19.6 FAIL
93 63 75 PASS
7 83 52.6 FAIL
66 46 54 PASS
94 11 44.2 FAIL
71 50 58.4 PASS
85 91 88.6 PASS
95 91 92.6 PASS
0 56 33.6 FAIL
82 57 67 PASS
12 95 61.8 FAIL
97 55 71.8 PASS
11 16 14 FAIL
82 53 64.6 PASS
29 19 23 FAIL
3 98 60 FAIL
91 90 90.4 PASS
34 86 65.2 FAIL
15 38 28.8 FAIL
54 34 42 FAIL
39 56 49.2 PASS
96 11 45 FAIL
42 51 47.4 PASS
51 27 36.6 FAIL
79 76 77.2 PASS
65 91 80.6 PASS
75 80 78 PASS
88 22 48.4 FAIL
98 98 98 PASS
99 48 68.4 PASS
75 8 34.8 FAIL
23 0 9.2 FAIL
26 58 45.2 FAIL
42 11 23.4 FAIL
81 71 75 PASS
ry statistics for these data.
most widely known of these is the Arithmetic Mean.

central tendency is the Median


that is equalled or exceeded by 50% of the group.

nt to calculate the pass rate

s to count all the students who passed


by a count of all the students in the group.

NOTES 1
EET 4 (Dispersion)
CONTENTS Dispersion

Having obtained measures of central tendency for the data we should now
compute a few measures of Dispersion (how the data are spread around their
central value). The commonest measure is the Variance and its square root - the Standard Devia
Once again the appropriate Excel functions and formulae are explained in the comments.
NOTES 2 VARIANCE TEST 1
VARIANCE TEST 2
VARIANCE AGGREGATE
35 35 40 ST DEVIATION TEST 1
ST DEVIATION TEST 2
40% 60% ST DEVIATION AGGREGATE
TEST 1 TEST 2 AGGREGATE VERDICT
57 38 45.6 PASS When the median has been used as the measure of central
50 34 40.4 FAIL the appropriate measure of dispersion is given by the
36 69 55.8 PASS We define:
47 83 68.6 PASS Q1 - the lower quartile is the mark below which 25% of the o
97 66 78.4 PASS Q3 - the upper quartile is the mark below which 75% of the
92 88 89.6 PASS Q2 is the median - 50% lie above and 50% lie below.
97 90 92.8 PASS Consequently the interquartile range is defined as Q3 minus
4 46 29.2 FAIL the range of marks obtained by the middle 50% of students
6 52 33.6 FAIL
81 70 74.4 PASS FOR TEST 1
24 90 63.6 FAIL QUARTILE 1
37 6 18.4 FAIL QUARTILE 2
96 88 91.2 PASS QUARTILE 3
70 44 54.4 PASS INTERQUARTILE RANGE
52 11 27.4 FAIL FOR TEST 2
46 2 19.6 FAIL QUARTILE 1
93 63 75 PASS QUARTILE 2
7 83 52.6 FAIL QUARTILE 3
66 46 54 PASS INTERQUARTILE RANGE
94 11 44.2 FAIL FOR AGGREGATE
71 50 58.4 PASS QUARTILE 1
85 91 88.6 PASS QUARTILE 2
95 91 92.6 PASS QUARTILE 3
0 56 33.6 FAIL INTERQUARTILE RANGE
82 57 67 PASS
12 95 61.8 FAIL
97 55 71.8 PASS
11 16 14 FAIL
82 53 64.6 PASS
29 19 23 FAIL
3 98 60 FAIL
91 90 90.4 PASS
34 86 65.2 FAIL
15 38 28.8 FAIL
54 34 42 FAIL
39 56 49.2 PASS
96 11 45 FAIL
42 51 47.4 PASS
51 27 36.6 FAIL
79 76 77.2 PASS
65 91 80.6 PASS
75 80 78 PASS
88 22 48.4 FAIL
98 98 98 PASS
99 48 68.4 PASS
75 8 34.8 FAIL
23 0 9.2 FAIL
26 58 45.2 FAIL
42 11 23.4 FAIL
81 71 75 PASS
e root - the Standard Deviation.
d in the comments.

d as the measure of central tendency then


ersion is given by the Inter Quartile Range.

k below which 25% of the observations lie.


rk below which 75% of the observations lie.
e and 50% lie below.
nge is defined as Q3 minus Q1 and gives
he middle 50% of students.

PROCEED TO SHEET 5 (Frequency Distributions)


CONTENTS Frequency Distributions

As well as a wide range of statistical functions, Excel has a number of prepared


replicate these functions and perform a wide variety of further statistical processes. The general routine is called
Data Analysis and is accessed from the tools menu. (SEE NOTE ON LEFT) NOTES 3
We will now use this facility to construct a Frequency Distribution and a Histogram for the Test 1 data.
The first thing to do - before invoking the routine - is to provide Excel with a range containing the intervals
into which the test 1 marks are to be classified. This is known as the Bin Range and has been made up for you
in the D14:D24 range below. Clearly we have chosen intervals of 10 marks.
Now access tools, data analysis, histogram. The input range is the raw data for test 1 so enter this as A14:A6
The bin range is B14:B24 and since both of these ranges contain text labels we must check the labels box.
Now check the chart output and cumulative percentages boxes. Finally check output range and enter it
as commencing in E14. Now click OK and the results should resemble the solution in the following link.
TEST 1 TEST 2 AGGREGATE INTERVALS
57 38 45.6 10
50 34 40.4 20
36 69 55.8 30 PROCEED TO SHEET 6 (The
47 83 68.6 40
97 66 78.4 50
92 88 89.6 60
97 90 92.8 70
4 46 29.2 80
6 52 33.6 90
81 70 74.4 100
24 90 63.6
37 6 18.4
96 88 91.2
70 44 54.4
52 11 27.4
46 2 19.6
93 63 75
7 83 52.6
66 46 54
94 11 44.2
71 50 58.4
85 91 88.6
95 91 92.6
0 56 33.6
82 57 67
12 95 61.8
97 55 71.8
11 16 14
82 53 64.6
29 19 23
3 98 60
91 90 90.4
34 86 65.2
15 38 28.8
54 34 42
39 56 49.2
96 11 45
42 51 47.4
51 27 36.6
79 76 77.2
65 91 80.6
75 80 78
88 22 48.4
98 98 98
99 48 68.4
75 8 34.8
23 0 9.2
26 58 45.2
42 11 23.4
81 71 75
NOTE
DATA ANALYSIS IS AN ADD-IN THAT HAS TO
ber of prepared Routines that can BE LOADED.
neral routine is called TO DO THIS CLICK ON THE OFFICE ICON AT
THE TOP LEFT OF THE SCREEN
NOW SELECT EXCEL OPTIONS
ng the intervals AND THEN ADD-INS
een made up for you AT THE BOTTOM OF THE SCREEN THAT APPEARS
(ALONGSIDE THE EXCEL ADD-INS) CLICK GO.
for test 1 so enter this as A14:A65. THEN CHECK THE ANALYSIS TOOLPAK BOX
k the labels box. AND THEN OK.
e and enter it DATA ANALYSIS IS NOW AVAILABLE AS THE LAST
ollowing link. histogram WHEN DATA IS SELECTED FROM THE MAIN MENU.

PROCEED TO SHEET 6 (The Excel data analysis Routine)


CONTENTS The Excel data analysis Routine

In this sheet we are going to use the Descriptive Statistics option from the data analysis routine.
Assuming it has been added in, we proceed as follows.
Select Tools then Data Analysis and then Descriptive Statistics.
At the dialogue box that appears, define the data range as A14:C64. Then check the "labels in
Next check the Summary Statistics box and then define the output range as E14.
Click OK and the results shown in the following link will be reproduced.
A few terms may not be familiar. They are explained in the links below
the output table.
35 35 40

0.4 0.6
TEST 1 TEST 2 AGGREGATE
57 38 45.6
50 34 40.4
36 69 55.8
47 83 68.6
97 66 78.4
92 88 89.6
97 90 92.8
4 46 29.2
6 52 33.6
81 70 74.4
24 90 63.6
37 6 18.4
96 88 91.2
70 44 54.4
52 11 27.4
46 2 19.6
93 63 75 STANDARD_ERROR
7 83 52.6
66 46 54 SAMPLE_VARIANCE
94 11 44.2
71 50 58.4 SKEWNESS
85 91 88.6
95 91 92.6 KURTOSIS
0 56 33.6
82 57 67
12 95 61.8
97 55 71.8
11 16 14
82 53 64.6
29 19 23
3 98 60
91 90 90.4
34 86 65.2
15 38 28.8
54 34 42
39 56 49.2
96 11 45
42 51 47.4
51 27 36.6
79 76 77.2
65 91 80.6
75 80 78
88 22 48.4
98 98 98
99 48 68.4
75 8 34.8
23 0 9.2
26 58 45.2
42 11 23.4
81 71 75
the data analysis routine.

Then check the "labels in 1st row" box.

Descriptive
xplained in the links below

PROCEED TO SHEET 7 (Ranking data)


CONTENTS Ranking data

Here we have the aggregate marks data for the student group.
It will often be useful to obtain each mark's rank in the group.
Sorting the data from high to low would clearly do this but in itself would not
supply useable ranks.
This is where the Rank function is useful.
First we have named the range containing the 50 aggregate marks as: AGG.
Next we note that the general syntax of the rank function is:
=RANK(Value to be ranked,Range of values from which to rank,1 for 1 to be the rank of lowest mark,0 for 1 to be the r
With the first mark located in A15 then the following will provide ascending ranks (1 = highest)
=RANK(A15,AGG,0) When copied down column B.
Enter this now to the B15 cell and copy down to obtain each mark's rank.
AGGREGATE RANK (1 = HIGHEST)
45.6
40.4
55.8
68.6 PROCEED TO SHEET 8 (Spearman’s rank corr
78.4
89.6
92.8
29.2
33.6
74.4
63.6
18.4
91.2
54.4
27.4
19.6
75
52.6
54
44.2
58.4
88.6
92.6
33.6
67
61.8
71.8
14
64.6
23
60
90.4
65.2
28.8
42
49.2
45
47.4
36.6
77.2
80.6
78
48.4
98
68.4
34.8
9.2
45.2
23.4
75
ut in itself would not

gate marks as: AGG.

st mark,0 for 1 to be the rank of the highest mark)


(1 = highest)
pied down column B.
each mark's rank.

8 (Spearman’s rank correlation coefficient)


CONTENTS Spearman’s rank correlation coefficient

Ranking data can be a very useful device from a number of perspectives.


To see one of these look at the data below on the marks in the two assignments.
In columns C and D we have used the Rank function to ran

TEST 1 TEST 2 RANK 1 RANK 2 D D2 test after naming A7:A56 and B7:B56 as Test1 and Test2 r
57 38 26 35 the rankings in each of the tests agree or disagree with one
50 34 30 37 The method of doing this is called Rank Correlation
36 69 37 19 known as Spearman's Rank Correlation Coefficient
47 83 31 13 minus one and plus one with the former indicating perfect d
97 66 3 20 latter indicating perfect agreement.
92 88 11 10 The first step in calculating the statistic is to use column E
97 90 3 7 difference in the ranks from: =C7-D7 copied down to E56.
4 46 48 32 As you can see, some of these differences are positive and
6 52 47 28 There will therefore be a tendency for them to cancel each
81 70 17 18 We deal with this by squaring the differences in column F f
24 90 41 7 Do this now and then copy down to F56
37 6 36 48 The next step is to sum the squared differences, so do this
96 88 6 10 cell below (I21) from =SUM(F7:F56).
70 44 23 34 Next we count the number of differences in I22 from:=COU

52 11 28 43 SUM D2
46 2 32 49 COUNT (N)
93 63 10 21
7 83 46 13 Now we define Spearman's Rank Correlation Coefficient
66 46 24 32 1 - 6*SUM D2/[N*(N2 -1)]
94 11 9 43 So use I27 below to calculate 6*SUM D2 from
71 50 22 30 6*SUM D2
85 91 14 4 and then use I29 below to calculate N*(N2-1)
95 91 8 4 N*(N2-1)
0 56 50 24 rank correlation coefficient in I31 below from
82 57 15 23 SPEARMAN
12 95 44 3
97 55 3 26
11 16 45 42
82 53 15 27 PROCEED TO SHEET 9 (
29 19 39 41
3 98 49 1
91 90 12 7
34 86 38 12
15 38 43 35
54 34 27 37
39 56 35 24
96 11 6 43
42 51 33 29
51 27 29 39
79 76 19 16
65 91 25 4
75 80 20 15
88 22 13 40
98 98 2 1
99 48 1 31
75 8 20 47
23 0 42 50
26 58 40 22
42 11 33 43
81 71 17 17
used the Rank function to rank the scores in each

B7:B56 as Test1 and Test2 respectively. The task is to see whether


ts agree or disagree with one another.
lled Rank Correlation and employs a device
orrelation Coefficient. This statistic lies between
he former indicating perfect disagreement and the

statistic is to use column E to calculate the


C7-D7 copied down to E56. Do this now.
e differences are positive and some are negative.
ency for them to cancel each other out.
the differences in column F from =E7^2.

uared differences, so do this in the blue

differences in I22 from:=COUNT(F7:F56) and call it N.

ank Correlation Coefficient as:


/[N*(N2 -1)]
w to calculate 6*SUM D2 from =6*I21

9 below to calculate N*(N2-1) from =I22*(I22^2-1)


Finally calculate Spearman's
coefficient in I31 below from =1-I27/I29

PROCEED TO SHEET 9 (Pearson’s correlation coefficient)


0
0
0
0

#DIV/0!
CONTENTS Pearson’s correlation coefficient

Spearman's rank correlation coefficient uses ranked ordinal data as the basis of its test
of association between the rankings.
However for numerical data there is another correlation coefficient known as
Coefficient that computes the degree of association between two numerical data sets.
Like Spearman's, it has a minimum value of minus one and a maximum value of plus one. Th
indicates a perfect negative or indirect association, while the latter indicates a perfect
positive or direct linear association.
Excel has a dedicated function to calculate Pearson's coefficient with the following syntax
To see it in operation we have reproduced the marks data below.

TEST 1 TEST 2 AGGREGATE To correlate the test 1 marks with the test 2 marks we should write:
57 38 45.6 =CORREL(A14:A63,B14:B63) Try it in the blue cell below.
50 34 40.4 Test 1/Test 2
36 69 55.8 Next we could correlate the test 1 marks with the aggregate from:
47 83 68.6 =CORREL(A14:A63,C14:C63) Try it in the blue cell below.
97 66 78.4 Test 1/Aggregate
92 88 89.6 Finally we can correlate the test 2 marks with the aggregate from:
97 90 92.8 =CORREL(B14:B63,C14:C63) Try it in the blue cell below.
4 46 29.2 test 2/aggregate
6 52 33.6 From this we conclude that the association between test 1 and test 2 m
81 70 74.4 not very high - performing well in one does not suggest that a student
24 90 63.6 well in the other.
37 6 18.4 However, the high positive correlation between the test 2 marks and th
96 88 91.2 suggests that performance in test 2 is highly associated with overall pe
70 44 54.4
52 11 27.4 Finally, associated with the correlation coefficient is a statistic known a
46 2 19.6 coefficient of determination. This is simply the square of the correlation
93 63 75 and is referred to in Excel as R squared or R2
7 83 52.6 Being the square of a number that is always less than or equal to one
66 46 54 always be less than or equal to the correlation coefficient.
94 11 44.2 Also, being a square, it will always be positive and so gives no indicati
71 50 58.4 direction of the association (positive/direct or negative/indirect).
85 91 88.6 The Excel function to calculate R squared is:
95 91 92.6 =RSQ(First data range,Second Dat
0 56 33.6 It is left as an exercise to calculate the R squared values for each of th
82 57 67 combinations.
12 95 61.8 TEST 1/TEST 2
97 55 71.8 TEST 1/AGGREGATE
11 16 14 TEST 2/AGGREGATE
82 53 64.6
29 19 23
3 98 60
91 90 90.4
34 86 65.2
15 38 28.8
54 34 42
39 56 49.2
96 11 45
42 51 47.4
51 27 36.6
79 76 77.2
65 91 80.6
75 80 78
88 22 48.4
98 98 98
99 48 68.4
75 8 34.8
23 0 9.2
26 58 45.2
42 11 23.4
81 71 75
nal data as the basis of its test

oefficient known as Pearson's Correlation


een two numerical data sets.
nd a maximum value of plus one. The former
the latter indicates a perfect

efficient with the following syntax =CORREL(First data range,Second Data range)

the test 2 marks we should write:


) Try it in the blue cell below.

marks with the aggregate from:


) Try it in the blue cell below.

marks with the aggregate from:


) Try it in the blue cell below.

sociation between test 1 and test 2 marks is


one does not suggest that a student will perform

tion between the test 2 marks and the aggregate


2 is highly associated with overall performance.

ation coefficient is a statistic known as the


s simply the square of the correlation coefficient

is always less than or equal to one R squared will


e correlation coefficient.
be positive and so gives no indication of the
ve/direct or negative/indirect).

First data range,Second Data range)


e the R squared values for each of the marks
0.145387

0.65352

0.84388

0.021137
0.427088
0.712134
CONTENTS Excel has added the category "more" to its bin range but its frequency is zero.
This is because there are no observations greater than 100.
The frequency column gives the absolute number of marks in the intervals:
0 - 10 (5), 11 - 20 (3), 21 - 30 (4) and so on.
These are shown as the columns of the histogram.
The cumulative % column gives the percentage of observations that are less than
or equal to the interval mark. Thus 5 out of 50 (10%) marks are less than or equal to 10,
while 5 + 3 + 4 = 12 out of 50 (24%) are less than or equal to 30.
This is shown as the line graph in the histogram.
35 35 40

40% 60%
TEST 1 TEST 2 AGGREGATE INTERVALS INTERVALS Frequency Cumulative %
57 38 45.6 10 10 5 10.00%
50 34 40.4 20 20 3 16.00%
36 69 55.8 30 30 4 24.00%
47 83 68.6 40 40 4 32.00%
97 66 78.4 50 50 5 42.00%
92 88 89.6 60 60 4 50.00%
97 90 92.8 70 70 3 56.00%
4 46 29.2 80 80 4 64.00%
6 52 33.6 90 90 6 76.00%
81 70 74.4 100 100 12 100.00%
24 90 63.6 More 0 100.00%
37 6 18.4
96 88 91.2
70 44 54.4
52 11 27.4 Frequency Distributions
46 2 19.6
93 63 75
7 83 52.6
66 46 54
94 11 44.2
71 50 58.4
85 91 88.6
95 91 92.6
0 56 33.6
82 57 67
12 95 61.8
97 55 71.8
11 16 14
82 53 64.6
29 19 23
3 98 60
91 90 90.4
34 86 65.2
15 38 28.8
54 34 42
39 56 49.2
96 11 45
42 51 47.4
51 27 36.6
79 76 77.2
65 91 80.6
75 80 78
88 22 48.4
98 98 98
99 48 68.4
75 8 34.8
23 0 9.2
26 58 45.2
42 11 23.4
81 71 75
are less than
s than or equal to 10,

Histogram

15 120.00%
100.00%
Frequency

10 80.00% Frequency
60.00%
Cumulative %
5 40.00%
20.00%
0 0.00%
10 30 50 70 90 ore
M
INTERVALS

cy Distributions
CONTENTS

35 35 40

40% 60%
TEST 1 TEST 2 AGGREGATE TEST 1
57 38 45.6
50 34 40.4 Mean 57.84
36 69 55.8 Standard Error 4.516849182
47 83 68.6 Median 61
97 66 78.4 Mode 97
92 88 89.6 Standard Deviation 31.93894686
97 90 92.8 Sample Variance 1020.096327
4 46 29.2 Kurtosis -1.23718592
6 52 33.6 Skewness -0.31490202
81 70 74.4 Range 99
24 90 63.6 Minimum 0
37 6 18.4 Maximum 99
96 88 91.2 Sum 2892
70 44 54.4 Count 50
52 11 27.4
46 2 19.6
93 63 75 STANDARD_ERROR
7 83 52.6
66 46 54 SAMPLE_VARIANCE
94 11 44.2
71 50 58.4 SKEWNESS
85 91 88.6
95 91 92.6 KURTOSIS
0 56 33.6
82 57 67
12 95 61.8
97 55 71.8
11 16 14
82 53 64.6
29 19 23
3 98 60
91 90 90.4
34 86 65.2
15 38 28.8
54 34 42
39 56 49.2
96 11 45
42 51 47.4
51 27 36.6
79 76 77.2
65 91 80.6
75 80 78
88 22 48.4
98 98 98
99 48 68.4
75 8 34.8
23 0 9.2
26 58 45.2
42 11 23.4
81 71 75
TEST 2 AGGREGATE

Mean 54.34 Mean 55.74


Standard Error 4.248083 Standard Error 3.331665
Median 55.5 Median 55.1
Mode 11 Mode 33.6
Standard Deviation 30.03849 Standard Deviation 23.55843
Sample Variance 902.3106 Sample Variance 555
Kurtosis -1.14313 Kurtosis -0.92139
Skewness -0.24417 Skewness -0.05748
Range 98 Range 88.8
Minimum 0 Minimum 9.2
Maximum 98 Maximum 98
Sum 2717 Sum 2787
Count 50 Count 50

Return to The Excel data analysis Routine


CONTENTS

The =COUNT function counts all entries that are numbers.


Since the data in the verdict field are text, =COUNT would return a value
of zero. That is why we counted the aggregate field which is numeric.
But if you need to count entries that are text (or text and numbers) then
the =COUNTA function will do this.

Excel possesses two variance and two standard deviation functions.


VARPand STDEVP are population statistics and divide by the total number of observations.
VAR and STDEV are sample statistics and divide by the total number of observations minus one.

Data analysis is an Add In and may not automatically be installed on your version of Excel.
To install it select Tools and then Add Ins and then check the first two boxes viz:
Analysis Toolpak and Analysis Toolpak VBA.

The standard error is a sampling concept and is defined as the sample standard deviation
divided by the square root of the sample size. Thus for test 1 we have = F19/SQRT(F27) which gives:

Excel has calculated the sample as opposed to the population variance and so produces a slightly different
answer from our earlier calculations.
However, the answer returned by the descriptive statistics routine could be reproduced exactly if we use
VAR rather than VARP. For example: 1020.096 QED The same is true for the standard deviati
that is, use STDEV rather than STDEVP 31.93895 QED

Skewness is the extent to which the histogram of the data is "off centre" (asymmetrical).
Skewness can be either negative, zero, or positive.
With negative skewness the histogram has significantly more observations to the right than the left (tail to the
left, peak to the right).
With zero skewness the histogram is symmetrical (peak in the middle, equivalent tails on either side).
With positive skewness the histogram has significantly more observations to the left than the right (tail to the
right, peak to the left).
The negative value calculated by Excel for the test 1 data (-0.315) confirms what the histogram shows -
namely that the marks are peaked to the right (a relatively large number of students with high marks).

Kurtosis is a measure of the flatness or peakedness of the histogram.


A bell shaped symmetrical histogram (such as the normal distribution) has a Kurtosis value of 3.
Any Kurtosis value greater than 3 means that the histogram is very peaked with wide tails.
Conversely any Kurtosis value that is less than 3 indicates that the histogram is flattish with short tails.
The value calculated for the test 1 data of -1.23 being absolutely less than 3 indicates that the
histogram is generally not very peaked.
Return to Central tendency

Return to Dispersion

Return to Frequency distributions

4.516849 Return to The Excel data analysis Routine

ghtly different

true for the standard deviation,


Return to The Excel data analysis Routine

e left (tail to the

ight (tail to the

Return to The Excel data analysis Routine

Return to The Excel data analysis Routine

You might also like