Statistics 180930091746
Statistics 180930091746
Statistics 180930091746
Dalia El-Shafei
Assistant professor, Community Medicine Department, Zagazig
University
STATISTICS
Descriptiv Inferential
e
TYPES OF DATA
Data
Quantitative Qualitative
Continuous
Discrete (no
(decimals Categorical Ordinal
decimal)
allowed)
Frequency
List
Distribution table
LIST
Total 30 100
GRAPHICAL PRESENTATION
Self explanatory.
Has a clear title indicating its content “written under the graph”.
Fully labeled.
Pie diagram
Histogram
Scatter diagram
Line graph
Frequency polygon
BAR CHART
Used for presenting discrete or qualitative data.
It is a graphical presentation of magnitude (value or
percentage) by rectangles of constant width & lengths
proportional to the frequency & separated by gaps
Simple
Component Multiple
SIMPLE BAR CHART
MULTIPLE BAR CHART
Percentage of Persons Aged ≥18 Years Who Were Current Smokers,
by Age and Sex — United States, 2002
COMPONENT BAR CHART
PIE DIAGRAM
Consist of a circle whose area represents the total
frequency (100%) which is divided into segments.
Each segment represents a proportional composition of
the total frequency.
HISTOGRAM
• It is very similar to bar chart with the difference that the
rectangles or bars are adherent (without gaps).
• It is used for presenting class frequency table
(continuous data).
• Each bar represents a class and its height represents the
frequency (number of cases), its width represent the class
interval.
SCATTER DIAGRAM
So; the results obtained from these data can not be applied
or generalized on the whole population.
NDC can be used in distinguishing between normal from
abnormal measurements.
Example:
If we have NDC for Hb levels for a population of normal
adult males with mean±SD = 11±1.5
If this reading lies within the area under the curve at 95%
of normal (i.e. mean±2 SD)he /she will be considered
normal. If his reading is less then he is anemic.
• Normal range for Hb in this example will be:
Higher HB level: 11+2 (1.5) =14.
Lower Hb level: 11–2 (1.5) = 8.
i.e the normal Hb range of adult males is from 8 to 14.
Data summarization
Measures of
Mode
Central tendency
Median
Range
Variance
Measures of
Dispersion Standard
deviation
Coefficient of
variation
Mean
Data summarization
Measures of
Mode
Central tendency
Median
Range
Variance
Measures of
Dispersion Standard
deviation
Coefficient of
variation
ARITHMETIC MEAN
Sum of observation divided by the number of observations.
x = mean
∑ denotes the (sum of)
x the values of observation
n the number of observation
ARITHMETIC MEAN
ARITHMETIC MEAN
In case of frequency distribution data we calculate the
mean by this equation:
ARITHMETIC MEAN
If data is presented in frequency table with class intervals
we calculate mean by the same equation but using the
midpoint of class interval.
MEDIAN
The middle observation in a series of observation
after arranging them in an ascending or
descending manner
Rank of median
Odd no. Even no.
• Seldom used.
Mode
Mean
Data summarization
Measures of
Mode
Central tendency
Median
Range
Variance
Measures of
Dispersion Standard
deviation
Coefficient of
variation
MEASURE OF DISPERSION
Range = 9 – 6 = 3
Descriptiv Inferential
e
INFERENCE
Inference involves making a generalization about a larger
group of individuals on the basis of a subset or sample.
HYPOTHESIS TESTING
To find out whether the observed variation among sampling
is explained by sampling variations, chance or is really a
difference between groups.
Qualitative
Quantitative variables
variables
>2
2 Means X2 test Z test
Means
Large
sample Small sample “<60” ANOVA
“>60”
Paired t-
z test t-test
test
COMPARING TWO MEANS OF LARGE SAMPLES USING
THE NORMAL DISTRIBUTION: (Z TEST OR SND
STANDARD NORMAL DEVIATE)
If t-value is less than that in the table, then the difference between
samples is insignificant.
If t-value is larger than that in the table so the difference is significant
i.e. the null hypothesis is rejected.
Statistical
significance
Small P-
value
Big t-value
PAIRED T-TEST:
If we are comparing repeated observation in the same
individual or difference between paired data, we have to
use paired t-test where the analysis is carried out using the
mean and standard deviation of the difference between
each pair.
Paired t= mean of difference/sq r of SD² of
difference/number of sample.
d.f=n – 1
ANALYSIS OF VARIANCE “ANOVA”
The main idea in ANOVA is that we have to take into account the
variability within the groups & between the groups
F-value is equal to the ratio between the means sum square
of between the groups & within the groups.
F = between-groups MS / within-groups MS
TESTSTests of significance
OF SIGNIFICANCE
Qualitative
Quantitative variables
variables
>2
2 Means X2 test Z test
Means
Large
sample Small sample “<60” ANOVA
“>60”
Paired t-
z test t-test
test
CHI -SQUARED TEST
Two categories
Technique used
Pain after treatment
GONSTEAD VS. DIVERSIFIED EXAMPLE -
RESULTS
Gonstead 9 21 30
Diversified 11 29 40
Column Total 20 50 70
Grand Total
Diversified 11 29 40
Column Total 20 50 70 Divide by grand total
Grand Total
Times column total
9 21
Gonstead 30
E = 30*20/70=8.6 E = 30*50/70=21.4
11 29
Diversified 40
E=40*20/70=11.4 E=40*50/70=28.6
Column Total 20 50 70
Grand Total
2
Use the Χ formula with each cell and then add
them together
2
o Find df and then consult a Χ table to see if statistically
significant
Degree of freedom = (row - 1) (column - 1)
P1=5/50=10%, p2=20/60=33%,
q1=100-10=90, q2=100-33=67
Z=10 – 33/ √ 10x90/50 + 33x67/60
Z= 23 / √ 18 + 36.85 z= 23/ 7.4 z= 3.1
Therefore there is statistical significant difference between
percentages of anemia in the studied groups (because z >2).
CORRELATION & REGRESSION
CORRELATION & REGRESSION
Correlation measures the closeness of the
association between 2 continuous variables, while
Linear regression gives the equation of the straight
line that best describes & enables the prediction of
one variable from the other.
CORRELATION
t-test for
correlation is
used to test the
significance of the
association.
CORRELATION IS NOT CAUSATION!!!
LINEAR REGRESSION
Same as correlation Differ than correlation
SCATTERPLOTS
Regression
line
MULTIPLE REGRESSION
The dependency of a dependent variable on several
independent variables, not just one.
Test of significance used is the ANOVA. (F test).
For example: if neonatal birth weight depends on these
factors: gestational age, length of baby and head
circumference. Each factor correlates significantly with
baby birth weight (i.e. has +ve linear correlation). We can
do multiple regression analysis to obtain a mathematical
equation by which we can predict the birth weight of any
neonate if we know the values of these factors.