Graphical Presentation - 2017

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Graphical

Graphical presentation
presentation
OBJECTIVES
OBJECTIVES
1. Distributions
2. Graphical Presentation of data
2.1 Histogram
2.2 Dotplot
2.3 Stem-and-Leaf Display (Stemplot)
3. Quartiles and Interquartile Range
4. Five-number Summary (Boxplot)
6. Describing main features of a distribution
Looking
Looking at
at Data
Data
Raw Graph
RawData
Data Graph

Question
Question Collect Present
Collect Organise
Organise Present Draw
Draw
to
tobe
be data data
data data
data data conclusion
conclusion
addressed
addressed

3
1.
1. Distributions
Distributions

The collection of measurements on a variable (measured


characteristic) is called a distribution.
It records the different values that variable can take and how
often each value occurs

4
2.
2. Frequency
Frequency Distribution
Distribution

Useful for categorical variable. The frequency is the count


of number of occurrences of each level of the variable. A
table listing each level together with the count is known as
a frequency table.

Table 1 The distribution of marital status


among the 60 participants.
Marital status Frequency Relative freq
Single 14 0.2333
Married 36 0.60
Other 10 0.1667

5
2.1
2.1 Pie-chart
Pie-chart
A graph for the relative frequencies for a categorical
variable.
•It is a circle, whose sectors constitute the different
categories of the variable.
•The relative frequency determines the size of the sector

17%
23%

single
married
others

60%

6
2.2
2.2 Grouped
Grouped Frequency
Frequency
Distribution
Distribution
Useful for measurement data (continuous mainly).
Example: Systolic blood pressure of nonsmokers (mm/Hg)
for 20 persons from the Honolulu Heart Study
102 190 122 116 116 136 118 134 178 162
120 138 126 176 104 140 102 142 146 112

Table 1 Systolic Blood Pressure for Smokers


Class Interval Frequency Relative f
100 - 119 7 0.35
120 - 139 6 0.30
140 - 159 3 0.15
160 - 179 3 0.15
180 - 200 1 0.05

Source: Honolulu Heart Study 7


33 Histogram
Histogram
A histogram describes a frequency distribution in terms
of a series of bars each used to represent the number of
class frequencies in a particular class.

8
44 Graphical
Graphical Presentation
Presentation of
of Data
Data --
Dotplot
Dotplot

The simplest form of graphic.

•Draw axis over range of data

•Draw a dot for each value

•Look for concentration of points and outliers

•Useful summary when data set < 30 9


Systolic blood pressure of nonsmokers (in mm of mercury) for 20 persons
from the Honolulu Heart Study

Non-smokers
138 128 112 128 134 104 152 134 132 130
118 108 108 128 134 162 98 144 118 118

10
Comparing two samples for differences

Systolic blood pressure of nonsmokers (in mm of mercury) for 40


persons from the Honolulu Heart Study
Non-smokers
138 128 112 128 134 104 152 134 132 130
118 108 108 128 134 162 98 144 118 118
Smokers
102 190 122 116 116 136 118 134 178 162
120 138 126 176 104 140 102 142 146 112 11
5 Stem & Leaf plots
Construction of Stem & Leaf plots

• Like a histogram, except that the individual values are shown

• Put observations in ascending order

• List the stems vertically (consists of all but the final digit)

• Attach the leaves to the stems (leaves are the final digit)

• Rewrite the stems and put leaves in ascending order from left
to right

Advantages of Stem & Leaf plots


• Used for small data sets
• Presents more detailed information 12
5.1 Stem & Leaf plots
sorted smokers
102 102 104 112 116 116 118 120 122 126
134 136 138 140 142 146 162 176 178 190

MTB > Stem-and-Leaf 'Smokers'.

Character Stem-and-Leaf Display

Stem-and-leaf of Smokers N = 20
Leaf Unit = 1.0

STEM LEAF
10 224
11 2668
12 026
13 468
14 026
15
16 2
17 68
18
19 0 13
5.2 Back-to-back Stem & Leaf plots
sorted smokers
102 102 104 112 116 116 118 120 122 126
134 136 138 140 142 146 162 176 178 190

sorted non-smokers
98 104 108 108 112 118 118 118 128 128
128 130 132 134 134 134 138 144 152 162

8 9
10 224
884 11 2668
8882 12 026
888 13 468
84420 14 026
4 15
2 16 2
2 17 68
18
19 0

No-smokers Smokers
14
66 Boxplot
Boxplot
Based on the Five Number Summary (What are these numbers)

 Central box includes the middle 50% of data


 Whiskers show range of data, something like…..

Q1 Q3

Minimum Maximum
Value Value
Median
15
6.1.
6.1. Boxplot
Boxplot

• Strengths of Box-plots
• The distribution of the data is easily seen
• Outliers can easily be picked
• Powerful in comparing distribution of a
variable across different groups (e.g. the
gene expressions)

16
6.2. Constructing
6.2. Constructing Box-plots
Box-plots
.
• A box-plot is a visual description of the
distribution based on the five number
summary, which are the
– Minimum
– Q1
– Median
– Q3
– Maximum
6.3. .
6.3. Example 11
Example

The pulse rates of 12 individuals arranged in


increasing order are:
62, 64, 68, 70, 70, 74, 74, 76, 76, 78, 78, 80

Q168.5, Q3=77.5

IQR = (77.5 – 69.5) = 8


6.3. Example
6.3. Example 11
.

whisker

Pulse rate

The long tails (i.e to min and max) are know a


whiskers
6.5. Outliers
6.5. Outliers and
and Boxplot
Boxplot
.
• Re-define the upper and lower limits of the
boxplots (the whisker lines) as:
Lower limit = Q1-1.5IQR, and
Upper limit = Q3+1.5IQR

• Note that the lines may not go as far as these


limits
• If a data point is < lower limit or > upper limit,
the data point is considered to be an outlier.
6.6. Example
6.6. Example –– CK
CK data
data
.

outliers
7.
7. Two
Two continuous
continuous
variables-scatter
variables-scatter plot
plot
• Displays the relationship between two continuous
variables

• Useful in the early stage of analysis when


exploring data and determining is a linear
regression analysis is appropriate

• May show outliers in your data

22
Age versus Systolic Blood Pressure
in a Clinical Trial

23
9.
9. Main
Main features
features of
of aa
distribution
distribution

1. The overall shape of the distribution


2. The central value for the variable
3. The amount of variations that the variable exhibits
4. The presence of unusual points

24
Distribution shapes

25
EXERCISE
EXERCISE 11
What can you say about the distribution?

A random sample of 51 specimens of milk was purchased from grocery stores in a


state. The number of coliform organisms per millilitre were counted and the results
were as follows:
Stem-and-leaf of coliform N = 51
Leaf Unit = 0.10
1 0 0
1 1
1 2
1 3
5 4 0000
10 5 00000
16 6 000000
23 7 0000000
(7) 8 0000000
21 9 00000000
13 10 0000000000000 26
6.7.
6.7. Graphic
Graphic vs
vs variable
variable type
type
• Categorical variables
– Frequency distributions, pie-chart, bar graph
• Discrete measurements (few outcomes)
– dot plot, stem & leaf
• Continuous/Discrete(many outcomes)
measurements
– Histogram, box-plot, scatter plots

27

You might also like