CSC 820 How To Do Analyses in Spss

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 39

Statistical Analysis using SPSS

SPSS Windows
• Data View
– Used to display data
– Columns represent variables
– Rows represent individual units or groups of units that share
common values of variables
• Variable View
– Used to display information on variables in dataset
– TYPE: Allows for various styles of displaying
– LABEL: Allows for longer description of variable name
– VALUES: Allows for longer description of variable levels
– MEASURE: Allows choice of measurement scale
• Output View
– Displays Results of analyses/graphs
Data Entry Tips
• For large datasets, use a spreadsheet such as EXCEL
which is more flexible for data entry, and import the
file into SPSS
• Give descriptive LABEL to variable names in the
VARIABLE VIEW
• Keep in mind that Columns are Variables, you don’t
want multiple columns with the same variable
Importing data into SPSS
To import an EXCEL file, click on:
FILE  OPEN  DATA then change FILES OF TYPE
to EXCEL (.xls)

To import a TEXT or DATA file, click on:


FILE  OPEN  DATA then change FILES OF TYPE
to TEXT (.txt) or
DATA (.dat)
You will be prompted through a series of dialog boxes to
import dataset
Descriptive Statistics-Numeric Data
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS DESCRIPTIVES
• Choose any variables to be analyzed and place them in box on right
• Options include:

y i n
Mean : y  i 1
Sum :  yi
n i 1

 y 
n
2
i y
Std. deviation : S  i 1
Variance : S 2
n 1
S
S.E. Mean :
n
Descr i pt i ve St at i st i cs

N M in im um M ax im um Sum M ean St d. Var ia nc e


St at is t ci St at is t ic St at is t ic St at is t ic St at is t ic St d. Er r or Dev
St atia tsi tio icn St at is t ic
CRCL 8 38 120 621 77. 63 8. 63 24. 401 595. 411
Valid N ( lis t wis e) 8
Descriptive Statistics-General Data
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS FREQUENCIES
• Choose any variables to be analyzed and place them in box on
right
• Options include (For Categorical Variables):
– Frequency Tables
– Pie Charts, Bar Charts
• Options include (For Numeric Variables)
– Frequency Tables (Useful for discrete data)
– Measures of Central Tendency, Dispersion, Percentiles
– Pie Charts, Histograms
Example 1.4 - Smoking Status

SM KSTTS

Cum ulat ive


Fr equency Per cent Valid Per cent Per cent
Valid Never Sm oked 1990 37. 9 37. 9 37. 9
Q uit > 10 Year sAgo 1063 20. 3 20. 3 58. 2
Q uit < 10 Year sAgo 609 11. 6 11. 6 69. 8
Cur r ent Cigar et t e Sm oker 1332 25. 4 25. 4 95. 2
O t her Tobacco User 253 4. 8 4. 8 100. 0
Tot al 5247 100. 0 100. 0
Vertical Bar Charts and Pie Charts
• After Importing your dataset, and providing names
to variables, click on:
• GRAPHS  BAR…  SIMPLE (Summaries for Groups
of Cases)  DEFINE
• Bars Represent N of Cases (or % of Cases)
• Put the variable of interest as the CATEGORY AXIS

• GRAPHS  PIE… (Summaries for Groups of Cases) 


DEFINE
• Slices Represent N of Cases (or % of Cases)
• Put the variable of interest as the DEFINE SLICES BY
Example 1.5 - Antibiotic Study
80

60

40

20

5
Count

4
0
1 2 3 4 5

3
OUTCOME
1

2
Histograms
• After Importing your dataset, and providing names
to variables, click on:
• GRAPHS  HISTOGRAM
• Select Variable to be plotted
• Click on DISPLAY NORMAL CURVE if you want a
normal curve superimposed (see Chapter 3).
Example 1.6 - Drug Approval Times
30

20

10

Std. Dev = 20.97


Mean = 32.1
0 N = 175.00

MONTHS
Side-by-Side Bar Charts

• After Importing your dataset, and providing names


to variables, click on:
• GRAPHS  BAR…  Clustered (Summaries for Groups
of Cases)  DEFINE
• Bars Represent N of Cases (or % of Cases)
• CATEGORY AXIS: Variable that represents groups to be
compared (independent variable)
• DEFINE CLUSTERS BY: Variable that represents
outcomes of interest (dependent variable)
Example 1.7 - Streptomycin Study
30

20
OUTCOME

10
3

5
Count

0 6
1 2

TRT
Scatterplots

• After Importing your dataset, and providing names


to variables, click on:
• GRAPHS  SCATTER  SIMPLE  DEFINE
• For Y-AXIS, choose the Dependent (Response) Variable
• For X-AXIS, choose the Independent (Explanatory)
Variable
Example 1.8 - Theophylline Clearance
8

2
THCLRNCE

0
.5 1.0 1.5 2.0 2.5 3.0 3.5

DRUG
Scatterplots with 2 Independent Variables

• After Importing your dataset, and providing names


to variables, click on:
• GRAPHS  SCATTER  SIMPLE  DEFINE
• For Y-AXIS, choose the Dependent Variable
• For X-AXIS, choose the Independent Variable with the
most levels
• For SET MARKERS BY, choose the Independent Variable
with the fewest levels
Example 1.8 - Theophylline Clearance
8

DRUG
2
THCLRNCE

Tagamet

1 Pepcid

0 Placebo
0 2 4 6 8 10 12 14 16

SUBJECT
Contingency Tables for Conditional
Probabilities
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, select the variable you are conditioning on
(Independent Variable)
• For COLUMNS, select the variable you are finding the conditional
probability of (Dependent Variable)
• Click on CELLS
• Click on ROW Percentages
Example 1.10 - Alcohol & Mortality

W I NE * DEAT H Cro s s ta b u l a ti o n

DEAT H
0 1 T o ta l
W I NE 0 Co u n t 10535 2155 12690
% wi t h i n W INE 8 3 .0 % 1 7 .0 % 1 0 0 .0 %
1 Co u n t 521 74 595
% wi t h i n W INE 8 7 .6 % 1 2 .4 % 1 0 0 .0 %
T o ta l Co u n t 11 0 5 6 2229 13285
% wi t h i n W INE 8 3 .2 % 1 6 .8 % 1 0 0 .0 %
Independent Sample t-Test

• After Importing your dataset, and providing names


to variables, click on:
• ANALYZE  COMPARE MEANS  INDEPENDENT
SAMPLES T-TEST
• For TEST VARIABLE, Select the dependent (response)
variable(s)
• For GROUPING VARIABLE, Select the independent
variable. Then define the names of the 2 levels to be
compared (this can be used even when the full dataset has
more than 2 levels for independent variable).
Example 3.5 - Levocabastine in Renal Patients
Group Stati sti cs

St d. Error
GROUP N Mean St d. Deviat ion Mean
AUC Non-Dialysis 6 563. 83 172. 032 70. 232
Hemodialysis 6 499. 67 131. 409 53. 647

In d e p e n d e n tS a mp le s T e s t

L e v e n e 's T e s tfo r
Eq u a lity o fVa ria n c e s t-e s tfo rE q u a lity o fMe a n s
9 5 %Co n fid e n c e
In te rv a lo fth e
Me a n Std .Erro r Dife re n c e
F Sig . t df Sig .(2 -ta ile d ) Dife re n c e Dife re n c e L o we r Up p e r
AUC Eq u a lv a ria n c e s
.2 0 4 .6 6 1 .7 2 6 10 .4 8 4 6 4 .1 7 8 8 .3 7 7 -1 3 2 .7 5 0 2 6 1 .0 8 3
a s s u me d
Eq u a lv a ria n c e s
.7 2 6 9 .3 5 3 .4 8 6 6 4 .1 7 8 8 .3 7 7 -1 3 4 .6 1 3 2 6 2 .9 4 6
n o ta s s u me d
Paired t-test
• After Importing your dataset, and providing names
to variables, click on:
• ANALYZE  COMPARE MEANS  PAIRED
SAMPLES T-TEST
• For PAIRED VARIABLES, Select the two dependent
(response) variables (the analysis will be based on first
variable minus second variable)
Example 3.7 - Cmax in SRC&IRC Codeine
Pa ire d Sa m p le s Sta tis tic s

Std . Erro r
Me a n N Std . De v i a ti o n Me a n
Pa i r SRC 2 1 7 .8 3 8 13 7 9 .7 7 9 2 2 2 .1 2 6 8
1 IRC 1 3 8 .8 1 5 13 5 9 .3 6 3 5 1 6 .4 6 4 5

Paired Samples Correlations

N Correlation Sig.
Pair 1 SRC & IRC 13 .746 .003

Pa i r ed Sam pl es Tes t

Pa ir e d Dif f e r e nc e s
95 % Con f id en c e
I n t e r v al of t he
St d. Er r or Dif f e r e nc e
M ea n St d. Dev ia t io n M ea n Lo we r Up pe r t df Sig . ( 2 - t aile d)
Pa ir 1 SRC - I RC 79 . 0 23 53 . 0 95 9 14 . 7 26 2 46 . 9 38 11 1. 10 9 5. 36 6 12 . 0 00
Chi-Square Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED, EXPECTED, ROW
PERCENTAGES, and ADJUSTED STANDARDIZED
RESIDUALS
• NOTE: Large ADJUSTED STANDARDIZED RESIDUALS
(in absolute value) show which cells are inconsistent with null
hypothesis of independence. A common rule of thumb is seeing
which if any cells have values >3 in absolute value
Example 5.8 - Marital Status & Cancer
MARI TAL * CANCREV Crosstabul ati on

CANCREV
Cancer No Cancer Tot al
MARI TAL Single Count 29 47 76
Expect ed Count 38. 1 37. 9 76. 0
% wit hin MARI TAL 38. 2% 61. 8% 100. 0%
Adjust ed Residual -2. 3 2. 3
Mar ried Count 116 108 224
Expect ed Count 112. 3 111. 7 224. 0
% wit hin MARI TAL 51. 8% 48. 2% 100. 0%
Adjust ed Residual .7 -. 7
Widowed Count 67 56 123
Expect ed Count 61. 6 61. 4 123. 0
% wit hin MARI TAL 54. 5% 45. 5% 100. 0%
Adjust ed Residual 1. 1 -1. 1
Div/ Sep Count 5 5 10
Expect ed Count 5. 0 5. 0 10. 0
% wit hin MARI TAL 50. 0% 50. 0% 100. 0%
Adjust ed Residual .0 .0
Tot al Count 217 216 433
Expect ed Count 217. 0 216. 0 433. 0
% wit hin MARI TAL 50. 1% 49. 9% 100. 0%
Chi-Square Te s ts

As y mp . Si g .
Va l u e df (2 -s i d e d )
Pe a rs o n Ch i -Sq u a re 5 .5 3 0 a 3 .1 3 7
L i k e l i h o o d Ra ti o 5 .5 7 2 3 .1 3 4
L i n e a r-b y -L i n e ar
3 .6 3 1 1 .0 5 7
As s o c i a ti o n
N o f Va l i d Ca s e s 433
a . 1 c e l l s (1 2 .5 % ) h a v e e x p e c te d c o u n t l e s s th an 5 . Th e
mi n i mu m e x p e c te d c o u n t i s 4 .9 9 .
Fisher’s Exact Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED and ROW
PERCENTAGES
• NOTE: You will want to code the data so that the outcome
present (Success) category has the lower value (e.g. 1) and the
outcome absent (Failure) category has the higher value (e.g. 2).
Similar for Exposure present category (e.g. 1) and exposure
absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.5 - Antiseptic Experiment
TRTREV * DEATHREV Crosstabul ati on

DEATHREV
Deat h No Deat h Tot al
TRTREV Ant isept ic Count 6 34 40
% wit hin TRTREV 15. 0% 85. 0% 100. 0%
Cont rol Count 16 19 35
% wit hin TRTREV 45. 7% 54. 3% 100. 0%
Tot al Count 22 53 75
% wit hin TRTREV 29. 3% 70. 7% 100. 0%

Chi - Square Test s

Asymp. Sig. Exact Sig. Exact Sig.


Value df ( 2- sided) ( 2- sided) ( 1- sided)
Pear son Chi- Squar e 8. 495b 1 . 004
Cont inuit y Cor r ect ion a 7. 078 1 . 008
Likelihood Rat io 8. 687 1 . 003
Fisher 's Exact Test . 005 . 004
Linear - by- Linear
8. 382 1 . 004
Associat ion
N of Valid Cases 75
a. Comput ed only f or a 2x2 t able
b. 0 cells ( . 0%) have expect ed count less t han 5. The minim um expect ed count is
10. 27.
McNemar’s Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the outcome for condition/time 1
• For COLUMNS, Select the outcome for condition/time 2
• Under STATISTICS, Click on MCNEMAR
• Under CELLS, Click on OBSERVED and TOTAL
PERCENTAGES
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome
absent (Failure) category has the higher value (e.g. 2). Similar for
Exposure present category (e.g. 1) and exposure absent (e.g. 2).
Use Value Labels to keep output straight.
Example 5.6 - Report of Implant Leak
S E L F RE V * S URGRE V Cro s s ta b u la tio n

S URGRE V
P re s e n t A b s e n t T o ta l
S E L F RE V P re s e n t Co u n t 69 28 97
% o f T o ta l 4 1 .8 % 1 7 .0 % 5 8 .8 %
A b s e n t Co u n t 5 63 68
% o f T o ta l 3 .0 % 3 8 .2 % 4 1 .2 %
T o ta l Co u n t 74 91 165
% o f T o ta l 4 4 .8 % 5 5 .2 % 1 0 0 .0 %

Chi-Square Tests

Exact Sig.
Value (2-sided)
McNemar Test .000a P-value
N of Valid Cases 165
a. Binomial distribution used.
Relative Risks and Odds Ratios
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on RISK
• Under CELLS, Click on OBSERVED and ROW PERCENTAGES
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome
absent (Failure) category has the higher value (e.g. 2). Similar for
Exposure present category (e.g. 1) and exposure absent (e.g. 2).
Use Value Labels to keep output straight.
Example 5.1 - Pamidronate Study
PAMI DREV * SKLEVREV Crosst abul at i on

SKLEVREV
Yes No Tot al
PAMI DREV Pamidr onat e Count 47 149 196
% wit hin PAMI DREV 24. 0% 76. 0% 100. 0%
Placebo Count 74 107 181
% wit hin PAMI DREV 40. 9% 59. 1% 100. 0%
Tot al Count 121 256 377
% wit hin PAMI DREV 32. 1% 67. 9% 100. 0%

Ris k Es tim a te

9 5 % Co n fi d e n c e
In te rv a l
Va l u e L o we r Up p e r
Od d s Ra ti o fo r PAMIDREV
.4 5 6 .2 9 3 .7 1 0
(Pa mi d ro n a te / Pl a c e b o )
Fo r c o h o rt SKL EVREV =
.5 8 7 .4 3 2 .7 9 5
Ye s
Fo r c o h o rt SKL EVREV =
1 .2 8 6 1 .11 3 1 .4 8 6
No
N o f Va l i d Ca s e s 377
Example 5.2 - Lip Cancer
PI PESREV * LI PCREV Crosstabul ati on

LI PCREV
Yes No Tot al
PI PESREV Yes Count 339 149 488
% wit hin PI PESREV 69. 5% 30. 5% 100. 0%
No Count 198 351 549
% wit hin PI PESREV 36. 1% 63. 9% 100. 0%
Tot al Count 537 500 1037
% wit hin PI PESREV 51. 8% 48. 2% 100. 0%

Ris k Es tim a te

9 5 % Co n fi d e n c e
In te rv a l
Va l u e L o we r Up p e r
Od d s Ra ti o fo r
4 .0 3 3 3 .111 5 .2 2 9
PIPESREV (Ye s / No )
Fo r c o h o rt L IPCREV =
1 .9 2 6 1 .6 9 8 2 .1 8 5
Ye s
Fo r c o h o rt L IPCREV = No .4 7 8 .4 1 2 .5 5 4
N o f Va l i d Ca s e s 1037
Correlation
After Importing your dataset, and providing names to
variables, click on:
ANALYZE  CORRELATE BIVARIATE
Select the VARIABLES
Select the PEARSON CORRELATION
Select the Two tailed test of significance
Select Flag significant correlations
Linear Regression
• After Importing your dataset, and providing names
to variables, click on:
• ANALYZE  REGRESSION  LINEAR
• Select the DEPENDENT VARIABLE
• Select the INDEPENDENT VARAIABLE(S)
• Click on STATISTICS, then ESTIMATES, CONFIDENCE
INTERVALS, MODEL FIT
Examples 7.1-7.6 - Gemfibrozil Clearance
Coef f i ci entas

Uns t andar diz ed St andar diz ed


Coef f ic ie nt s Coef f ic ie nt s 95% Conf id enc e I nt er v al f or B
M odel B St d. Er r or Bet a t Sig . Lower Bound Upper Bound
1 ( Cons t ant ) 460. 828 54. 338 8. 481 . 000 345. 010 576. 646
CLCR - 3. 215 1. 181 - . 575 - 2. 723 . 016 - 5. 732 - . 698
a. Dependent Var ia ble : CLG M
Examples 7.1-7.6 - Gemfibrozil Clearance
ANO VAb

Sum of
Model Squar es df Mean Squar e F Sig.
1 Regr ession 107168. 2 1 107168. 158 7. 413 . 016a
Residual 216865. 8 15 14457. 723
Tot al 324034. 0 16
a. Pr edict or s: ( Const ant ) , CLCR
b. Dependent Var iable: CLG M

Mode l Sum ma ryb

Ad j u s te d Std . Erro r o f
Mo d e l R R Sq u a re R Sq u a re th e Es ti ma te
1 .5 7 5 a .3 3 1 .2 8 6 1 2 0 .2 4 0
a . Pre d i c to rs : (Co ns ta n t), CL CR
b . De p e n d en t Va ri a b l e : CL GM

You might also like