CSC 820 How To Do Analyses in Spss
CSC 820 How To Do Analyses in Spss
CSC 820 How To Do Analyses in Spss
SPSS Windows
• Data View
– Used to display data
– Columns represent variables
– Rows represent individual units or groups of units that share
common values of variables
• Variable View
– Used to display information on variables in dataset
– TYPE: Allows for various styles of displaying
– LABEL: Allows for longer description of variable name
– VALUES: Allows for longer description of variable levels
– MEASURE: Allows choice of measurement scale
• Output View
– Displays Results of analyses/graphs
Data Entry Tips
• For large datasets, use a spreadsheet such as EXCEL
which is more flexible for data entry, and import the
file into SPSS
• Give descriptive LABEL to variable names in the
VARIABLE VIEW
• Keep in mind that Columns are Variables, you don’t
want multiple columns with the same variable
Importing data into SPSS
To import an EXCEL file, click on:
FILE OPEN DATA then change FILES OF TYPE
to EXCEL (.xls)
y i n
Mean : y i 1
Sum : yi
n i 1
y
n
2
i y
Std. deviation : S i 1
Variance : S 2
n 1
S
S.E. Mean :
n
Descr i pt i ve St at i st i cs
SM KSTTS
60
40
20
5
Count
4
0
1 2 3 4 5
3
OUTCOME
1
2
Histograms
• After Importing your dataset, and providing names
to variables, click on:
• GRAPHS HISTOGRAM
• Select Variable to be plotted
• Click on DISPLAY NORMAL CURVE if you want a
normal curve superimposed (see Chapter 3).
Example 1.6 - Drug Approval Times
30
20
10
MONTHS
Side-by-Side Bar Charts
20
OUTCOME
10
3
5
Count
0 6
1 2
TRT
Scatterplots
2
THCLRNCE
0
.5 1.0 1.5 2.0 2.5 3.0 3.5
DRUG
Scatterplots with 2 Independent Variables
DRUG
2
THCLRNCE
Tagamet
1 Pepcid
0 Placebo
0 2 4 6 8 10 12 14 16
SUBJECT
Contingency Tables for Conditional
Probabilities
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, select the variable you are conditioning on
(Independent Variable)
• For COLUMNS, select the variable you are finding the conditional
probability of (Dependent Variable)
• Click on CELLS
• Click on ROW Percentages
Example 1.10 - Alcohol & Mortality
W I NE * DEAT H Cro s s ta b u l a ti o n
DEAT H
0 1 T o ta l
W I NE 0 Co u n t 10535 2155 12690
% wi t h i n W INE 8 3 .0 % 1 7 .0 % 1 0 0 .0 %
1 Co u n t 521 74 595
% wi t h i n W INE 8 7 .6 % 1 2 .4 % 1 0 0 .0 %
T o ta l Co u n t 11 0 5 6 2229 13285
% wi t h i n W INE 8 3 .2 % 1 6 .8 % 1 0 0 .0 %
Independent Sample t-Test
St d. Error
GROUP N Mean St d. Deviat ion Mean
AUC Non-Dialysis 6 563. 83 172. 032 70. 232
Hemodialysis 6 499. 67 131. 409 53. 647
In d e p e n d e n tS a mp le s T e s t
L e v e n e 's T e s tfo r
Eq u a lity o fVa ria n c e s t-e s tfo rE q u a lity o fMe a n s
9 5 %Co n fid e n c e
In te rv a lo fth e
Me a n Std .Erro r Dife re n c e
F Sig . t df Sig .(2 -ta ile d ) Dife re n c e Dife re n c e L o we r Up p e r
AUC Eq u a lv a ria n c e s
.2 0 4 .6 6 1 .7 2 6 10 .4 8 4 6 4 .1 7 8 8 .3 7 7 -1 3 2 .7 5 0 2 6 1 .0 8 3
a s s u me d
Eq u a lv a ria n c e s
.7 2 6 9 .3 5 3 .4 8 6 6 4 .1 7 8 8 .3 7 7 -1 3 4 .6 1 3 2 6 2 .9 4 6
n o ta s s u me d
Paired t-test
• After Importing your dataset, and providing names
to variables, click on:
• ANALYZE COMPARE MEANS PAIRED
SAMPLES T-TEST
• For PAIRED VARIABLES, Select the two dependent
(response) variables (the analysis will be based on first
variable minus second variable)
Example 3.7 - Cmax in SRC&IRC Codeine
Pa ire d Sa m p le s Sta tis tic s
Std . Erro r
Me a n N Std . De v i a ti o n Me a n
Pa i r SRC 2 1 7 .8 3 8 13 7 9 .7 7 9 2 2 2 .1 2 6 8
1 IRC 1 3 8 .8 1 5 13 5 9 .3 6 3 5 1 6 .4 6 4 5
N Correlation Sig.
Pair 1 SRC & IRC 13 .746 .003
Pa i r ed Sam pl es Tes t
Pa ir e d Dif f e r e nc e s
95 % Con f id en c e
I n t e r v al of t he
St d. Er r or Dif f e r e nc e
M ea n St d. Dev ia t io n M ea n Lo we r Up pe r t df Sig . ( 2 - t aile d)
Pa ir 1 SRC - I RC 79 . 0 23 53 . 0 95 9 14 . 7 26 2 46 . 9 38 11 1. 10 9 5. 36 6 12 . 0 00
Chi-Square Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED, EXPECTED, ROW
PERCENTAGES, and ADJUSTED STANDARDIZED
RESIDUALS
• NOTE: Large ADJUSTED STANDARDIZED RESIDUALS
(in absolute value) show which cells are inconsistent with null
hypothesis of independence. A common rule of thumb is seeing
which if any cells have values >3 in absolute value
Example 5.8 - Marital Status & Cancer
MARI TAL * CANCREV Crosstabul ati on
CANCREV
Cancer No Cancer Tot al
MARI TAL Single Count 29 47 76
Expect ed Count 38. 1 37. 9 76. 0
% wit hin MARI TAL 38. 2% 61. 8% 100. 0%
Adjust ed Residual -2. 3 2. 3
Mar ried Count 116 108 224
Expect ed Count 112. 3 111. 7 224. 0
% wit hin MARI TAL 51. 8% 48. 2% 100. 0%
Adjust ed Residual .7 -. 7
Widowed Count 67 56 123
Expect ed Count 61. 6 61. 4 123. 0
% wit hin MARI TAL 54. 5% 45. 5% 100. 0%
Adjust ed Residual 1. 1 -1. 1
Div/ Sep Count 5 5 10
Expect ed Count 5. 0 5. 0 10. 0
% wit hin MARI TAL 50. 0% 50. 0% 100. 0%
Adjust ed Residual .0 .0
Tot al Count 217 216 433
Expect ed Count 217. 0 216. 0 433. 0
% wit hin MARI TAL 50. 1% 49. 9% 100. 0%
Chi-Square Te s ts
As y mp . Si g .
Va l u e df (2 -s i d e d )
Pe a rs o n Ch i -Sq u a re 5 .5 3 0 a 3 .1 3 7
L i k e l i h o o d Ra ti o 5 .5 7 2 3 .1 3 4
L i n e a r-b y -L i n e ar
3 .6 3 1 1 .0 5 7
As s o c i a ti o n
N o f Va l i d Ca s e s 433
a . 1 c e l l s (1 2 .5 % ) h a v e e x p e c te d c o u n t l e s s th an 5 . Th e
mi n i mu m e x p e c te d c o u n t i s 4 .9 9 .
Fisher’s Exact Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED and ROW
PERCENTAGES
• NOTE: You will want to code the data so that the outcome
present (Success) category has the lower value (e.g. 1) and the
outcome absent (Failure) category has the higher value (e.g. 2).
Similar for Exposure present category (e.g. 1) and exposure
absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.5 - Antiseptic Experiment
TRTREV * DEATHREV Crosstabul ati on
DEATHREV
Deat h No Deat h Tot al
TRTREV Ant isept ic Count 6 34 40
% wit hin TRTREV 15. 0% 85. 0% 100. 0%
Cont rol Count 16 19 35
% wit hin TRTREV 45. 7% 54. 3% 100. 0%
Tot al Count 22 53 75
% wit hin TRTREV 29. 3% 70. 7% 100. 0%
S URGRE V
P re s e n t A b s e n t T o ta l
S E L F RE V P re s e n t Co u n t 69 28 97
% o f T o ta l 4 1 .8 % 1 7 .0 % 5 8 .8 %
A b s e n t Co u n t 5 63 68
% o f T o ta l 3 .0 % 3 8 .2 % 4 1 .2 %
T o ta l Co u n t 74 91 165
% o f T o ta l 4 4 .8 % 5 5 .2 % 1 0 0 .0 %
Chi-Square Tests
Exact Sig.
Value (2-sided)
McNemar Test .000a P-value
N of Valid Cases 165
a. Binomial distribution used.
Relative Risks and Odds Ratios
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on RISK
• Under CELLS, Click on OBSERVED and ROW PERCENTAGES
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome
absent (Failure) category has the higher value (e.g. 2). Similar for
Exposure present category (e.g. 1) and exposure absent (e.g. 2).
Use Value Labels to keep output straight.
Example 5.1 - Pamidronate Study
PAMI DREV * SKLEVREV Crosst abul at i on
SKLEVREV
Yes No Tot al
PAMI DREV Pamidr onat e Count 47 149 196
% wit hin PAMI DREV 24. 0% 76. 0% 100. 0%
Placebo Count 74 107 181
% wit hin PAMI DREV 40. 9% 59. 1% 100. 0%
Tot al Count 121 256 377
% wit hin PAMI DREV 32. 1% 67. 9% 100. 0%
Ris k Es tim a te
9 5 % Co n fi d e n c e
In te rv a l
Va l u e L o we r Up p e r
Od d s Ra ti o fo r PAMIDREV
.4 5 6 .2 9 3 .7 1 0
(Pa mi d ro n a te / Pl a c e b o )
Fo r c o h o rt SKL EVREV =
.5 8 7 .4 3 2 .7 9 5
Ye s
Fo r c o h o rt SKL EVREV =
1 .2 8 6 1 .11 3 1 .4 8 6
No
N o f Va l i d Ca s e s 377
Example 5.2 - Lip Cancer
PI PESREV * LI PCREV Crosstabul ati on
LI PCREV
Yes No Tot al
PI PESREV Yes Count 339 149 488
% wit hin PI PESREV 69. 5% 30. 5% 100. 0%
No Count 198 351 549
% wit hin PI PESREV 36. 1% 63. 9% 100. 0%
Tot al Count 537 500 1037
% wit hin PI PESREV 51. 8% 48. 2% 100. 0%
Ris k Es tim a te
9 5 % Co n fi d e n c e
In te rv a l
Va l u e L o we r Up p e r
Od d s Ra ti o fo r
4 .0 3 3 3 .111 5 .2 2 9
PIPESREV (Ye s / No )
Fo r c o h o rt L IPCREV =
1 .9 2 6 1 .6 9 8 2 .1 8 5
Ye s
Fo r c o h o rt L IPCREV = No .4 7 8 .4 1 2 .5 5 4
N o f Va l i d Ca s e s 1037
Correlation
After Importing your dataset, and providing names to
variables, click on:
ANALYZE CORRELATE BIVARIATE
Select the VARIABLES
Select the PEARSON CORRELATION
Select the Two tailed test of significance
Select Flag significant correlations
Linear Regression
• After Importing your dataset, and providing names
to variables, click on:
• ANALYZE REGRESSION LINEAR
• Select the DEPENDENT VARIABLE
• Select the INDEPENDENT VARAIABLE(S)
• Click on STATISTICS, then ESTIMATES, CONFIDENCE
INTERVALS, MODEL FIT
Examples 7.1-7.6 - Gemfibrozil Clearance
Coef f i ci entas
Sum of
Model Squar es df Mean Squar e F Sig.
1 Regr ession 107168. 2 1 107168. 158 7. 413 . 016a
Residual 216865. 8 15 14457. 723
Tot al 324034. 0 16
a. Pr edict or s: ( Const ant ) , CLCR
b. Dependent Var iable: CLG M
Ad j u s te d Std . Erro r o f
Mo d e l R R Sq u a re R Sq u a re th e Es ti ma te
1 .5 7 5 a .3 3 1 .2 8 6 1 2 0 .2 4 0
a . Pre d i c to rs : (Co ns ta n t), CL CR
b . De p e n d en t Va ri a b l e : CL GM