PG Descriptive and Inferential Statistic 2024

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 51

Alhamdulillahirabbil 'alamiin.

All praises be to Allah, the Lord of all the worlds.


Blessings and Peace be upon Muhammad the
Messenger of Allah, the Seal of the Prophets.

123456
O Allah, we thank You for the pleasure of having
the knowledge that is blessed by You. We
beseech You to bless our teachers and parents
with Your Guidance. We pray for Your Guidance
to become righteous students who are always
close to You, and who also bring happiness to
our teachers and parents.

Amin Ya Rabbal 'alamiin.


DESCRIPTIVE
Prof Dr Shamima Abdul
AND Rahman
INFERENTIAL Graduate Research
STATISTICS School
UOC
LEARNING OUTCOMES
Upon completion of this topic, the students should be able to:

1. Define descriptive and inferential statistics


2. Explain descriptive statistics.
3. Describe types of inferential statistics and the relevant statistical
parameters used
4. Explain statistical errors
5. Apply hypothesis testing and interpret analysis in given examples
(throughout this course)
DESCRIPTIVE VS
INFERENTIAL STATISTICS

Do not allow us to make conclusions beyond the


Applied to populations and the data we have analysed or reach conclusions
properties of populations regarding any hypotheses we might have made.
Descriptive statistics
 The term given to the analysis of data that helps describe,
show or summarize data in a meaningful way such that, for
example, patterns might emerge from the data.
 Descriptive statistics do not, however, allow us to make
conclusions beyond the data we have analysed or reach
conclusions regarding any hypotheses we might have made.
 Very important because if we simply presented our raw
data it would be hard to visualize what the data was
showing, especially if there was a lot of it.
Descriptive statistics
• Measures of • Describing the • Summarizing a group
central tendency central position of a of data by describing
• Measures of frequency how spread out the
distribution for a scores are
dispersion group of data • Range, quartile,
• Mean, median, mode variance, deviation,
standard deviation
Measures of
2 important Measures of
central
key: dispersion
tendency
Measures of central tendency
Measures of dispersion
 Variance is a numerical value that describes the
variability of observations from its arithmetic mean.
 Standard deviation is a measure of the spread of scores
within a set of data. Usually, we are interested in the
standard deviation of a population.
Descriptive statistics

Measures of tendency
& dispersion

Measures of skewness
& kurtosis
SKEWNESS
• Measure of symmetry, or
more precisely, the lack of
symmetry.
Definition: • A distribution or data set, is
symmetric if it looks the
same to the left and right of
the center point.

• Symmetric / bell shaped


Types: • Positive skewness
• Negative skewness
Skewness
 Skewness measures the degree of asymmetry exhibited by
the data
n

 (x  x)
i
3

skewness  i 1
3
ns
 If skewness equals zero, the histogram is symmetric
about the mean
 Positive skewness vs negative skewness
Skewness

Source: http://library.thinkquest.org/10030/3smodsas.htm
 Positive skewness
 There are more observations
below the mean than above it
 When the mean is greater than
the median
Skewness  Negative skewness
 There are a small number of low
observations and a large number
of high ones
 When the median is greater than
the mean
SKEWNESS
Eg:
oTeacher expects most of the students get good marks. If it
happens, then the curve looks like the normal curve.

oBut for some reasons (e. g., lazy students, not


understanding the lectures, not attentive etc.) it is not
happening. So we get another two curves.
Positively skewed Negatively skewed
KURTOSIS
 Kurtosis is a parameter that describes the shape of a
random variable’s probability distribution.
Kurtosis
 Kurtosis measures how peaked the histogram is

 (x  x)
i
4

kurtosis  i
4
3
ns
 The kurtosis of a normal distribution is 0

 Kurtosis characterizes the relative peakness or


flatness of a distribution compared to the normal
distribution
Kurtosis
 Platykurtic – When the kurtosis < 0, the
frequencies throughout the curve are closer to be
equal (i.e., the curve is more flat and wide)
 Thus, negative kurtosis indicates a relatively flat
distribution
 Leptokurtic – When the kurtosis > 0, there are
high frequencies in only a small part of the curve
(i.e, the curve is more peaked)
 Thus, positive kurtosis indicates a relatively
peaked distribution
These two distributions have the same variance,
approximately the same skew, but differ markedly in
kurtosis.
SPSS
SPSS
SPSS OUTPUT
How to write your analysis?
INFERENTIAL STATISTICS

A technique that allow us


It is, therefore,
to use collected samples
important that the
to make generalizations/
sample accurately
conclusion about the
represents the
populations from which
population.
the samples were drawn.
INFERENTIAL STATISTICS
LIMITATIONS OF INFERENTIAL STATISTICS
 Providing data about a population that you have not fully
measured
 cannot ever be completely sure that the values/statistics
you calculate are correct.

 Some, but not all, inferential tests require the user to make
educated guesses (based on theories or previous published
studies) to run the inferential tests.
TYPES OF INFERENTIAL STATISTICS

A. ESTIMATION
• Estimate a population parameter from a sample statistics

B. HYPOTHESIS TESTING
• Answer specific question related to a population parameter using a
sample statistics
• Allow making predictions, estimations or inferences about what have
been observed based on a sample
A. ESTIMATION
 POINT ESTIMATE
A single numerical value used to estimate the
corresponding parameter

 INTERVAL ESTIMATE
Consists of 2 numerical values, which in range, with a
specific degree of confidence, includes the parameter
being estimated
EXAMPLE: Point
Estimate

46.3 59.7 73.8

95% Confidence Interval

 Estimation valid when the sample size is as large as the population


size or vice versa
 If sampling is from a normal distribution – sample mean will be equal/
near to population mean
 95% Confidence interval: regardless where the distribution of sample
mean is, ~95% of the possible values of sample mean will be located
within the 2 standard deviations of the mean
CONFIDENCE LEVEL
 The probability that the interval will contain the
parameter
 Tells you how sure you can be

• 95%
- the most commonly used
• 99%
- less error, more accurate
- wider CI
Interpretation

 Mean (95% CI) survival time = 46.96 (45.28, 48.63) months


How we report it:
 The mean survival time is 46.96 months
 We are 95% confident that the mean survival time of breast cancer
population lies between 45.28 and 48.63 months
Interpretation

 The death rate of breast cancer population = 6% (95% CI: 5.00,


7.00)
 We are 95% confident that the death rate of breast cancer
population lies between 5.00% and 7.00%
Notes
•For the same CI (e.g. 95%), we look at the interval width – wider CI means less
accurate information hence less precision
B. HYPOTHESIS TESTING
 Answer specific question related to a population
parameter using sample statistics
 Allow making predictions, estimations or inferences about
what has been observed based on a sample

TYPE OF HYPOTHESIS
1. Null hypothesis, H0: no difference/ no effect/ no
association
2. Alternative hypothesis, HA: the is a difference/ an
effect/ an association
P VALUE
 The probability of error if you reject the null hypothesis
and accept the alternative hypothesis
 Range: 0.00 – 1.00
 Acceptable standard : less than 5% error i.e. p value is set
at 0.05
 If p value is < 0.05, reject null hypothesis i.e. we accept
alternative hypothesis
 If p value  0.05, we cannot reject null hypothesis

Note: if p value is 0, write as p<0.05 instead of 0.00


Example

 P value = 0.01
There is 1% probability of error in our conclusion if we
conclude as alternative hypothesis
 P value = 0.20
The is 20% probability of error in our conclusion if we
conclude as alternative hypothesis
Notes
•P value and CI always complement each other
•If CI contains 0 or 1 or crosses 0 (i.e. the value of no
difference), we cannot reject the null hypothesis
EXAMPLE
OF
RESEARCH
DATA
ANALYSIS
STATISTICAL ERRORS
 An integral part of hypothesis testing

1. Type I error:
- rejecting H0 when it is actually true (there is no difference)
- the level of significance
- the probability of this occurring is denoted by alpha ().
- p <  : reject null hypothesis
- p >  : do not reject null hypothesis

2. Type II error ():


- the probability of failing to reject H 0 when there is a true
difference
- the probability of NOT committing Type II error is called the Power of a
Null hypothesis Null hypothesis
(H0) is true (H0) is false

Reject null Type I error Correct outcome


hypothesis False positive True positive

Fail to reject Correct outcome Type II error


null hypothesis True negative False negative

POP QUIZ
 Which error we would like to minimise the
most??
POWER (1-)
 The probability of detecting an effect (i.e. the ability of
rejecting H0 when it is suppose to) i.e. how sensitive the test is
 Common standard is minimum 80%.
 Tests attempting to demonstrate evidence of equality (instead
of differences) will sometimes specify higher powers (95%)

 Influenced by:
1. The significance level (probability of a Type I error) you set
for the hypothesis test
2. The size of the difference you wish to detect (effect size)
3. The sample size
STEPS IN HYPOTHESIS
TESTING
 Step 1: state the hypothesis
 Step 2: Set the significance level (e.g.  = 0.05)
 Step 3: Check assumptions
 Step 4: Test statistics
 Step 5: Interpret (descriptive, P value and confidence
interval) and present (graph, table etc)
 Step 6: Conclusion
An experiment was set up to study the drug,
Zetformin on its effect on glucose level in
blood. A group of 12 women which are diabetic
patient were randomly assigned to this drug.
The blood glucose level was measured before
the treatment and after 3 months treatment.
Regarding inferential statistic in this study.
Thank
you!!!

You might also like