QTT Project 2 2023
QTT Project 2 2023
QTT Project 2 2023
COM(MAIF)
Choosing a Measure:
• Range: Choose when you need a quick and simple measure of the spread
and outliers are not a significant concern. However, it's not
recommended when dealing with datasets containing outliers.
• Variance and Standard Deviation: Choose when you want a more
comprehensive measure of variability and are willing to accept sensitivity
to outliers. Standard deviation is often preferred over variance because it
is in the same units as the original data.
• Interquartile Range (IQR): Choose when you want a measure that is less
sensitive to outliers and provides a better description of the central 50%
of the data. IQR is particularly useful when the dataset has extreme
values.
• Skewness: Choose when you want to understand the asymmetry in the
data distribution. Skewness is helpful for identifying whether the data is
symmetric or skewed and in which direction.
Part 2: Data Analysis
Dataset 1: [15, 18, 20, 22, 23, 25, 28, 30, 32, 33, 35, 36, 38, 40, 45, 47, 50, 52,
55, 60]
Dataset 1:
1. Range:
• Range=Max−Min
=60−15=45.
2. Variance:
• Calculate Mean (μ):
Mean=
15+18+20+22+23+25+28+30+32+33+35+36+38+40+45+47+50+52+55+60
20
= 704/20 = 35.2
Calculate Variance:
• Variance= ∑(xi-u)2
N
Variance= (15-35.2)2+(18-35.2)2+(20-35.2)2+…………………………..+(60-35.2)2
20
= 319.12/20
= 15.956.
3. Standard Deviation:
= 15.956 = 3.99.
5. Skewness:
• Calculate Skewness:
Skewness=
= (15-35.2)3+(18-35.2)3+(20-35.2)3+……………………………….+(60-35.2)3
20*(3.99)3
= 11,139/1,270
= 8.77.
Dataset 2: [25, 25, 26, 27, 27, 28, 28, 30, 30, 30, 31, 32, 32, 33, 34, 35, 36, 38,
40, 42]
1.Range:
Range=Max−Min=42−25=17.
2.Variance:
• Calculate Mean (μ):
Mean=25+25+26+27+27+28+28+30+30+30+31+32+32+33+34+35+36+38
+40+42 20
= 629/20
= 31.45.
Calculate Variance:
• Variance=∑(xi-u)2
N
Variance= (25-31.45)2+(25-31.45)2+(26-31.45)2…………………………+(42-31.45)2
20
= 87.295/20
= 4.36.
3. Standard Deviation:
• Standard Deviation= Variance
= 4.36
= 2.088.
4. Interquartile Range (IQR):
• Calculate Quartiles:
• Q1=30
• Q3=35
• IQR=Q3−Q1=35−30=5.
Skewness:
• Calculate Skewness:
Skewness=
Skewness= (25−31.45)3+(25−31.45)3+………………………………+(42−31.45)
20*(2.088)3
= 1,274/182
= 7.
Dataset 1:
1. Range:
• The range of 45 indicates a substantial spread between the
minimum and maximum values in Dataset 1.
2. Variance and Standard Deviation:
• The high variance (15.956) and standard deviation (3.99) suggest a
relatively large degree of variability among the data points. This
indicates that the values in Dataset 1 are spread out from the
mean.
3. Interquartile Range (IQR):
• The IQR of 22 suggests that the middle 50% of the data falls within
this range. It's a measure of the spread of the central portion of
the dataset, showing that the majority of the values are
concentrated within this interval.
4. Skewness:
• The positive skewness (8.77) indicates a slight skewness to the
right. This means that there are some larger values that are pulling
the distribution in that direction.
Dataset 2:
1. Range:
• The range of 17 indicates a smaller spread between the minimum
and maximum values in Dataset 2 compared to Dataset 1.
2. Variance and Standard Deviation:
• The lower variance (4.36) and standard deviation (2.088) suggest a
smaller degree of variability among the data points in Dataset 2
compared to Dataset 1.
3. Interquartile Range (IQR):
• The IQR of 5 indicates that the middle 50% of the data is
concentrated within a smaller range compared to Dataset 1.
4. Skewness:
• The positive skewness (7) suggests a slight skewness to the right.
Similar to Dataset 1, there are some larger values that are pulling
the distribution in that direction.
Interpretation:
• Dataset 1: This dataset has a larger range, higher variance, and standard
deviation, indicating a wider spread of values and greater variability. The
positive skewness suggests that there are some higher values pulling the
distribution to the right.
• Dataset 2: This dataset has a smaller range, lower variance, and standard
deviation, indicating less variability compared to Dataset 1. The positive
skewness also suggests a slight skewness to the right, indicating some
higher values.
these measures collectively provide insights into the variability and distribution
of values in each dataset. Dataset 1 exhibits higher variability and a wider
spread, while Dataset 2 has lower variability and a more concentrated
distribution. The skewness values indicate a slight rightward skewness in both
datasets, suggesting the presence of higher values.
Dataset 1:
• Range: 45
• Variance: 15.956
• Standard Deviation: 3.99
• Interquartile Range (IQR): 22
• Skewness: 8.77 (positive)
Dataset 2:
• Range: 17
• Variance: 4.36
• Standard Deviation: 2.088
• Interquartile Range (IQR): 5
• Skewness: 7 (positive)
Comparison:
1. Range:
• Dataset 1 has a much larger range (45) compared to Dataset 2
(17), indicating a wider spread of values.
2. Variance and Standard Deviation:
• Dataset 1 has a significantly higher variance (15.956) and standard
deviation (3.99) compared to Dataset 2 (Variance: 4.36, Standard
Deviation: 2.088). This suggests that the values in Dataset 1 are
more spread out from the mean compared to Dataset 2.
3. Interquartile Range (IQR):
• Dataset 1 has a larger IQR (22) compared to Dataset 2 (5),
indicating that the middle 50% of the data is more spread out in
Dataset 1.
4. Skewness:
• Both datasets exhibit positive skewness, suggesting a slight
skewness to the right. However, the skewness values are relatively
close, and the difference is not substantial.
• Variability: Dataset 1 exhibits more variability than Dataset 2. This
conclusion is supported by the larger range, higher variance, standard
deviation, and IQR in Dataset 1 compared to Dataset 2.
• Why?: The larger spread in Dataset 1, as evidenced by the higher range,
variance, and standard deviation, indicates that the values are more
dispersed from the mean. The larger IQR also suggests that the middle
50% of the data in Dataset 1 covers a wider range compared to Dataset
2. Overall, these measures collectively point to Dataset 1 having a higher
degree of variability.
Dataset 1:
• Mean: = 36.7
• Median: = 35.5
Relationship:
1. Mean and Median: In Dataset 1, the mean is slightly higher than the
median. This suggests that the distribution is slightly right-skewed, which
aligns with the positive skewness value (8.77).
2. Dispersion Measures:
• The large range (45) and high standard deviation (3.99) indicate a
wide spread of values from the mean.
• The IQR (22) is also relatively large, indicating a significant spread
within the middle 50% of the data.
• The positive skewness further confirms that there are some higher
values pulling the distribution to the right.
Dataset 2:
• Mean: = 31.7
• Median: = 30.5
Relationship:
1. Mean and Median: In Dataset 2, the mean is slightly higher than the
median, indicating a slight rightward skewness. This aligns with the
positive skewness value (7).
2. Dispersion Measures:
• The smaller range (17) and lower standard deviation (2.088)
indicate a narrower spread of values from the mean compared to
Dataset 1.
• The IQR (5) is also relatively small, indicating a concentrated
distribution within the middle 50% of the data.
• The positive skewness suggests that there are some higher values
pulling the distribution to the right, but the effect is less
pronounced than in Dataset 1.
Differences Between Datasets:
1. Variability: Dataset 1 has higher variability, as indicated by a larger range,
higher variance, and higher standard deviation compared to Dataset 2.
2. Central Tendency vs. Dispersion: In both datasets, the means are slightly
higher than the medians, suggesting a slight rightward skewness.
However, the effect is more pronounced in Dataset 1.
3. Spread within the Middle 50% (IQR): Dataset 1 has a larger IQR,
indicating a wider spread within the middle 50% of the data compared to
Dataset 2.
Dataset 1 exhibits higher variability, a slightly more pronounced rightward
skewness, and a wider spread within the middle 50% of the data compared to
Dataset 2. The relationships between measures of central tendency and
dispersion provide a comprehensive understanding of the distributional
characteristics of each dataset.
PART 5: Conclusion.