330 Lecture4 2015

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

STATS 330: Lecture 4

Trellis Graphics

26.07.2015
Housekeeping

I Contact details
Office auckland.ac.nz hours
Steffen Klaere 303.219 s.klaere 9:3010:30, Thu+Fri
Arden Miller 303.229C a.miller Wed 910, Thu 121

I Class representatives
Course aucklanduni.ac.nz
Jessica Courtney 330 jcou608
Monica Hill 330 mhil084
??? 762 ???

I Assignment 1 is due August 10

I Lecturer evaluation before mid semester break in Tutorial.


Todays Lecture: More on Trellis graphics

Aim of the lecture


I To give you an idea of the scope of Trellis graphics

I To discuss several examples and show how Trellis graphics


reveal important insights into the data.
Recall last time

I Last lecture we discussed coplots

I Coplots show how the relationship between two variables, x


and y , changes as the value of a third variable, z, changes

I We can generalise this to more than one variable z, i.e.


conditioning on more than one variable
Coplots: syntax

plot(y x|z w )

I plot is the plot type, either xyplot, dotplot or bwplot


I x and y are the relationship variables
I z and w are the conditioning variables, optional
Conditioning on two variables

I Suppose we have two conditioning variables, Z and W .

I No problem if both are categorical

I If one or both are continuous variables, we turn them into


categorical variables by using subranges, e.g.,
I Turn ages into 10 year age groups
I Turn marks into grades
Conditioning on two variables: Example

Gender is already categorical


Female Male
Age not, so divide the age range 0 17
according to TV rating range 18 34
35 49
This gives a 4 2 table with 8 50+
cells.
Conditioning on two variables: Example

In each of the 8 cells of the table, we can draw a graph that


illustrates the relationship between x and y for individuals having
that age and gender.

Type of graph will depend on


variable type of x and y Female Male
Both continuous: scatterplot 0 17
18 34
One continuous: boxplots,
35 49
dotplots, etc
50+
Both categorical: mosaic plots
x and y continuous: xyplot

xyplot(y~x|gender*age)

18 20 22 24 18 20 22 24

M M M M
017 1834 3549 50+
20


18







16







14




12


10
F F F F
y

017 1834 3549 50+


20

18



16




14





12




10
18 20 22 24 18 20 22 24

x
x categorical, y continuous: dotplot

dotplot(y~x|gender*age)

M M M M
017 1834 3549 50+
12



10





8










6




4

2
F F F F
y

017 1834 3549 50+


12


10




8







6




4




2
A B A B A B A B
x categorical, y continuous: bwplot

bwplot(y~x|gender*age)

M M M M
017 1834 3549 50+
12


10


8


6

4

2
F F F F
y

017 1834 3549 50+


12

10

8



6

2
A B A B A B A B
x categorical, y categorical: mosaicplot

mosaicplot(table(age=age,sex=gender,x=x,y=y),
col=c("red","blue"),main=NA)

017 1834 3549 50+


A B A B A B A B
a
F
b
sex

a
M
b

age
x categorical, y categorical: mosaicplot

mosaicplot(table(age=age,sex=gender,x=x,y=y),
col=c("red","blue"),main=NA)

017 1834
A B A

a
F
b
sex
In summary

18 20 22 24 18 20 22 24

M M M M
017 1834 3549 50+
20



18








16







14




12


10
F F F F

y
017 1834 3549 50+
20

I The conditioning variables 18

16












determine the layout of the 14

12


10

cells 18 20 22 24

x
18 20 22 24

017 1834 3549 50+

The x/y variables determine


A B A B A B A B

a
the kind of graph to draw in

F
b
each cell
sex

a
M
b

age
What to look for

I Does the value range for x and y change across panels?

I Do we observe any kind of relationship between x and y ?

I Does the relationship between x and y change for different


levels combinations of z w ?
Example: Sports

I In a study on athletes at the Australian Institute of Sport,


various physical measurements were made.

I In this example we look at the relationship between body fat


and BMI, and how it differs between athletes of either gender
playing different sports.

weight (kg)
BMI =
(height (m))2
Body fat vs. BMI conditioned on Gender and Sport

5 10 15 20 25 5 10 15 20 25

male male male male male


Athletics BBall Row Swim Tennis
35

30





25






















20

female female female female female


BMI

Athletics BBall Row Swim Tennis


35


30


25





20



5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

Percent body fat


Conclusions: Sport data

I Male athletes have lower percent body fat than females (well
known).
I Male BMI tends to be larger than female BMI.
I Largest range of BMI values for athletes (quite diverse
number of disciplines there).
I Apart from female athletics there doesnt seem to be any
relationship between percent body fat and BMI.
I Initial scatterplot indicated positive relationship between BMI
and percent body fat which goes away when looking at sports
separately.
Example: engines

I In a study of engine emissions, a test engine was run under


different conditions and the amount of nitrogen oxide (NOx)
emitted was measured.

I The conditions involved different settings of the compression


ratio C , and the equivalence ratio, E (related to fuel/air
mixture).
I How does NOx relate to E ?
Does the relationship
depend on C ?
I There are only 5 settings of
C (7.5, 9.0, 12.0, 15.0,
18.0) so we condition on
these.
NOx vs E given C

0.6 0.8 1.0 1.2

15 18
4



3


2

Nitrogen oxide concentration








1



7.5 9 12
4




3



2






1

0.6 0.8 1.0 1.2 0.6 0.8 1.0 1.2

Equivalence ratio
NOx vs E given C

7.5
9

12

15
18


Nitrogen oxide concentration

0.6 0.7 0.8 0.9 1.0 1.1 1.2

Equivalence ratio
Conclusions: Ethanol data

I NOx and E have a sinoid relationship.

I For E between 0.5 and about 0.9 the relationship is increasing


and from 0.9 to 1.3 it is decreasing.

I The impact of C on the relationship is reflected by a larger


variation for E between 0.5 and 0.9.
Example: yarn

In an experiment to test the strength of different yarns, lengths of


yarn are repeatedly stressed until they break (cycles to failure). It
is desired to see how this variable is related to the length of the
yarn samples, the amplitude and the load (two variables related to
the amount of stress). The experiment involved using 3
amplitudes, 3 lengths and 3 loads, for a total of 27 = 3 3 3
different experimental conditions. (Coursebook p.9)
Testing procedure
Cycles to failure: number of pushes before yarn breaks.

amplitude

length
Yarn length vs. yarn cycles

0 1000 2000 3000

load:high load:high load:high


amp:low amp:med amp:high
high

med

low
load:med load:med load:med
amp:low amp:med amp:high
high
length

med

low
load:low load:low load:low
amp:low amp:med amp:high
high

med

low

0 1000 2000 3000 0 1000 2000 3000

cycles
Conclusions

I For longer lengths, the cycles to failure are higher. (less likely
to break)

I High loads reduce the cycles to failure. (more likely to break)

I High amplitudes reduce the cycles to failure. (more likely to


break)

I Most likely to break when load and amplitude are high and
length is low.
Summary

I Trellis graphics are a powerful tool to assess the influence of


multiple variables at once!

I Permits to compare different types of variables.

I Can detect confounding between variables!

I Can help to untangle difficult relationships.


Thank you

You might also like