STA301 IMP Notes Headings and Some Questions Answers Prepared by
STA301 IMP Notes Headings and Some Questions Answers Prepared by
STA301 IMP Notes Headings and Some Questions Answers Prepared by
Statistics : Statistics is that science which enables to draw conclustions about various
phenomena one the basis of real data collected on sample basis.
Sample : Sample is that part of the Population from which information is collected.
Ordinal It includes the characteristic of a nominal scale and in addition has the
Scale : property of ordering or ranking of measurments e.g the performance of
students can be rated as excellent,good or poor.
Interval A measurment scale possessing a constant interval size but not true zero
Scale : point is called an Interval Scale.
Ratio Scale It is a special kind of an interval scale in which the scale of measurment
: has a true zero point as its origin.
Mean The mean deviation is defined as the arithmetic mean of the deviations
Deviation : measured either from the mean or from the median, all deviations being
counted as positive.
Chebshev's Chebshev's Theorem states that "For any number K greater than one at
Theorm : least 1-1/k2 of the data values fall with in K standard deviations of the
mean i.e. within the interval.
Moments : Moments are the arithmetic means of the powers to which the deviations
are raised.
Mutually Two events are said to be mutually exclusive events if and only if they
Exclusive can not both occur together at the same time. OR Two events are said to
Event : be mutually exclusive events if the occurrence of one event discard the
occurrence of other event.
Independent Two events A and B in the same sample space S, are defined to be
events : independent (or statistically independent) if the probability that one event
occurs, is not affected by whether the other event has or has not occured.
Distribution The function which gives the probability of the event that X takes a value
Function : less than or equal TO a specified value x is called a distribution function
and is also called the cumulative distribution function.
Cumulative The function which gives the probability of the event that X takes a value
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Sampling A sampling frame is a complete list of all the elements in the population.
Frame :
Sampling The sampling error is the difference between the the sample statistic and
Error : the population parameter.
Probability Probability samples are those in which following the sampling plan each
Samples : unit in the poplation has a known probability of being included in the
sample.
Non Non probability samples are those in which the sample elements are the
probability arbitrarily selected by the sampler because in this judgment the elements
samples : thus chosen will most effectively represent the Population.
Variable : A measurable quantity which can vary from one individual or object to
another is called a variable.
Constant : A quantity which can assume only one value is called a constant
Mode : The mode is a value which occures most frequently in a set of data i.e. it
indicates the most common result
Box and A Box and Whisker plot provides a graphical representation of data
Whisker through its five number summary.
plot :
The five A five number summary consists of X0, Q1, median, Q3, and Xm. It
number enables us to find the shape of the distribution without drawing a graph.
summary :
EXHAUSTIVE Two or more than two mutually exclusive events are said to be
EVENTS : exhaustive events when their union constitute the entire sample space
Equally Two events A and B are said to be equally likely when one event is as
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Probability Probability is defined as the ratio of favorable cases over equally likely
: cases.
Tabulation The process of arranging data into rows and columns is called tabulation.
:
Class Mark The class mark or mid point is that value which divides a class into two
or Mid equall parts.
Point :
Mid Poin The mid point or class mark is that value which divides a class into two
or Class equal parts.
Mark :
The Semi- The quartile deviation or the Semi-interquartile Range is defined as half
interquartile of the difference between the first and third quartiles.
Range :
Disjoint Two sets A and B are said to be disjoint Sets if they have no elements in
Set : common.
Randomized These designs are those in which treatments are applied to experimental
Deigns; : units randomly and conclusions are supported by the statistical results.
Local It is used to bring all extraneous sources of variations under control. For
Control: : this purpose we use Local Control, a term referring to the amount of
balancing, blocking and grouping of the experimental units.
Critical The value that separates the critical region from the acceptance region, is
Value : called the critical value(s).
Deciles : Deciles are those nine quantities that divide the distribution into ten
equall parts.
Percentiles Percentiles are those ninety nine quantities that divide the distribution
: into hundred equall parts
Range : The range is defined as the difference between the maximum and
minimum values of a data set.
Quartile The quartile deviation is defined as half of the difference between the
Deviation : first and third quartiles.
standard The degree of scatter of the observed values about the regression line
error of measured by what is called standard deviation of regression or standard
estimate : error of estimate.
Secondary The data published or used by an organization other than the one which
Data : origninally collected them are known as secondary data.
Harmonic Harmonic mean is defined as the reciprocal of the arithmetic mean of the
Mean : reciprocals of the values.
Quartiles : Quartiles are those three quantities that divide the distribution into four
equal parts.
Standard Standard Deviation is defined as the positive square root of the mean of
Deviation : the squared deviations of the values from their mean.
called regression.
Sub Set : A set that consists of some elements of an other set is called a subset of
that set.
Non- Such errors which are not attributable to sampling but arise in the process
Sampling of data collection even if a complete count is carried out.
Error :
Universal All sets are subsets of one particular set called universal set.
Set :
P value : The p-value is a property of the data, and it indicates “how improbable”
the obtained result really is.
Test A statistic (i.e. a function of sample data not containing any parameter),
Statistic : which provides a basis for testing a null hypothesis, is called a test
statistics.
Blocking : The process of using the same or similar experimental units for all
treatments. The purpose of blocking is to remove a source of variation
from the error term and hence provide a more powerful test for a
difference in population or treatment means.
Box plot : A graphical summary of data. A box, drawn from the first to the third
quartiles, shows the location of the middle 50% of the data. Dashed lines,
called whiskers, extending from the ends of the box show the location of
data values greater than the third quartile and data values less than the
first quartile. The locations of any outliers are also noted.
Central A theorem that enables one to use the normal probability distribution to
limit approximate the sampling distribution of the sample mean and sample
theorem : proportion whenever the sample size is large.
One-tailed A hypothesis test in which rejection of the null hypothesis occurs for
test : values of the test statistic in one tail of the sampling distribution. or The
entire rejection region lies in only one of the two tails, either in the right
tail or in the left tai, of the sampling distribution of the test-statistic, is
called a one-tailed test or one-sided test.
Point The sample statistic that provides the point estimate of the population
estimator : parameter.
Power A graph of the probability of rejecting H0 for all possible values of the
curve : population parameter not satisfying the null hypothesis. The power curve
provides the probability of correctly rejecting the null hypothesis.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Probability A function, denoted by f(x), that provides the probability that x assumes a
function : particular value for a discrete random variable.
Qualitative Data that are labels or names used to identify an attribute of each
data : element. Qualitative data may be nonnumeric or numeric.
Quantitative Data that indicate how much or how many of something. Quantitative
data : data are always numeric.
Two-tailed A hypothesis test in which rejection of the null hypothesis occurs for
test : values of the test statistic in either tail of the sampling distribution.
Unbiasedness A property of a point estimator when the expected value of the point
: estimator is equal to the population parameter it estimates.
Union of The event containing all sample points that are in A, in B, or in both. The
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Systematic These designs are those in which treatments are applied to the
Designs: : experimental units by some systematic manner that is choice of the
experimenter
Acceptance All possible values which a test-statistic may assume can be divided into
and two mutually exclusive groups: One group consisting of values which
rejection appear to be consistent with the null hypothesis (i.e. values which appear
region: : to support the null hypothesis), and the other having values which lead to
the rejection of the null hypothesis. The first group is called the
acceptance region and the second set of values is known as the rejection
region for a test
Type I When we perform a hypothesis test, we derive evidence from the sample
error: : in the form of a test statistics. There is a possibility that sample may lead
us to make a wrong decision. We may reject the hypothesis when it is in
fact true. This type of error is called an error of first kind or type I-error.
The probability of committing a type I error is denoted by α. Thus α is the
probability of rejecting null hypothesis Ho when Ho true.
Type II When we perform a hypothesis test, we derive evidence from the sample
error: : in the form of a test statistics. There is a possibility that sample may lead
us to make a wrong decision. We may accept the hypothesis when it is in
fact false. This type of error is called an error of second kind or a Type II
error. The probability of committing a type II error is denoted by β. Thus
β is the probability of accepting null hypothesis Ho when Ho false.
Class The point in each class that is halfway between the lower and upper class
midpoint : limits.
Complement The event consisting of all sample points that are not in A.
of event A :
Dot plot : A simple graphical summary of data with each observation represented
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
by a dot placed above a horizontal axis that shows the range of values for
the observations.
Discrete A random variable that may assume either a finite number of values or an
random infinite sequence of values.
variable :
Empirical A rule that states the percentages of items that are within one, two, and
rule : three standard deviations from the mean for mound-shaped, or bell-
shaped, distributions.
Five- An exploratory data analysis technique that uses the following five
number numbers to summarize the data set: smallest value, first quartile, median,
summary : third quartile, and largest value.
Frame : A list of the sampling units for a study. The sample is drawn by selecting
units from the frame.
Frequency A tabular summary of data showing the number (or frequency) of items
distribution in each of several non-overlapping classes.
:
Intersection The event containing all sample points that are in both A and B. The
of A and B intersection is denoted AÇB.
:
Joint The probability of two events both occurring; that is, the probability of
probability the intersection of two events.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Regression The equation that describes how the mean or expected value of the
equation : dependent variable is related to the independent variable.
Rejection The range of values that will lead to the rejection of a null hypothesis.
region :
Residual : The difference between the observed value of the dependent variable and
the value predicted using the estimated regression equation.
Sample A numerical value used as a summary measure for a sample (e.g., the
statistic : sample mean, the sample variance, and the sample standard deviation).
The value of the sample statistic is used to estimate the value of the
population parameter.
Sampling The units selected for sampling. A sampling unit may include several
unit : elements.
Sampling Once an element has been included in the sample, it is returned to the
with population. A previously selected element can be selected again and
replacement therefore may appear in the sample more than once.
:
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Sampling Once an element has been included in the sample, it is removed from the
without population and cannot be selected a second time.
replacement
:
Simple Finite population: a sample selected such that each possible sample of
random size n has the same probability of being selected. Infinite population: a
sampling : sample selected such that each element comes from the same population
and the elements are selected independently.
Question: What is Frequency? What are the steps for making frequency
distribution?
Answer: Frequency:
It is a record of how often each value (or set of values) of the variable in
question occurs. It may be enhanced by the addition of percentages that fall into
each category
Steps in Frequency Distribution:
Following are the basic rules to construct frequency distribution:
1. Decide the number of classes into which the data are to be grouped
& it depends upon the size of data.
2. Determine the RANGE (difference between the smallest &largest
values in data) data.
3. Decide where to locate the class limit (numbers typically use to
identify the classes).
4. Determine the reaming class limits by adding the class interval
repeatedly.
5. Distribute the data into classes by using tally marks and sum it in
frequency column. Finally, total the frequency column to see that all data have
been accounted for.
Question: What is the relation between these two Moments & Moment Ratios . ?
Answer: Moments: A moment designates the power to which deviations are raised
before averaging them. Moment ratio: These are certain ratios in which
both numerators and the denominators are moments.
Question: What is the difference between these two limits when we are dealing with
continuous random variable: 0<5 and 0 ≤ x ≤5 .
Answer: In case ofcontinuous random variable there is no differecce both are
discribing the same thing either we mention the equal sign or not that is
,the random variable ranging from 0 to 5.
Question: What is the difference between the Poisson distribution and the normal
distribution?
Answer: Poisson distribution. The Poisson distribution is referred to as the
distribution of rare events. Examples of Poisson distributed variables are
number of accidents per person, number of sweepstakes won per person,
or the number of catastrophic defects found in a production process.
While: Normal Distribution. The normal distribution (the "bell-shaped
curve" which is symmetrical about the mean) is a theoretical function
commonly used in inferential statistics as an approximation to sampling
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Question: What is the difference between type-I error and type -II error ?.
Answer: Type-I error:
Type-II error:
In a hypothesis test, a type II error occurs when the null
hypothesis H0, is not rejected when it is in fact false. For
example if the accused is, in fact, guilty (i-e Ho is false) and
the finding of the judge is innocent, the judge has accepted
the false null hypothesis and by accepting the false null
hypothesis he has committed a type –II error.
quartile range & P’s are the percentiles. It has been shown that K for a
normal distribution is .263 and it lies between 0 and 0.50.
Question: What is random variable and how the fdp is related to it?
Answer: RANDOM VARIABLE: Such a numerical quantity whose value is
determined by the outcome of a random experiment is called a random
variable. For example, no. of children in a family, daily income of a
medical store etc. It is of two types (i) Discrete random variable (ii)
Continuous random variable Probability density function (pdf) is the
expression or formula which gives us the probability for given range of
values of the continuous random variable.
2:1, 5,6,6,7 having mean=5 Hence in such a situation we, need a measure
which tell us how dispersed the data are. The measure used for this
purpose is called measure of dispersion.
A sample is generally selected for study because the population is too large to
study in its entirety. The sample should be representative of the general
population. This is often best achieved by random sampling. Also, before
collecting the sample, it is important that the researcher carefully and completely
defines the population, including a description of the members to be included.
Example:
The population for a study of infant health might be all children born in the
Pakistan in the 1980's. The sample might be all babies born on 7th May in any
of the years.
Question: What are the different ways of representing the frequency distribution
graphically?
Answer: There are three ways of graphical representation of frequency distribution.
HISTOGRAM:
A histogram consists of a set of adjacent rectangles whose bases
are marked off by class boundaries along the X-axis, and whose heights are
proportional to the frequencies associated with the respective classes.
FREQUENCY POLYGON:
A frequency polygon is obtained by plotting the class frequencies
against the mid-points of the classes, and connecting the points so obtained by
straight line segments.
FREQUENCY CURVE:
When the frequency polygon constructed over class intervals
made sufficiently small for a large number observation, is smoothed, it
approaches a continuous curve, such a curve is called Frequency Curve.
Types of Frequency Curves:
The frequency distribution occurring in practice, usually belong to one of the
following four types. You will study about them in your next lecture.
1. The Symmetrical Distribution.
2. Moderately Skewed Distribution.
3. Extremely Skewed or J-shaped Distribution
4. U-Shaped Distribution
Question: What is meant by mid-rang and mid-quartile range and what is the
difference between these two ranges.?
Answer: MID-RANGE: If there are n observations with x0 and xm as their
smallest and largest observations respectively, then their mid-range is
defined as Mid range=X0+Xm/2. It is obvious that if we add the smallest
value with the largest, and divide by 2, we will get a value which is more
or less in the middle of the data-set. MID-QUARTILE RANGE: If x1,
x2… xn are n observations with Q1andQ3 as their first and third quartiles
respectively, then their mid-quartile range is defined as Mid Quartile
range= Q1+Q3 /2. Difference: They both used as measures of central
tendency because they both provide us with more or less the middle value
of data. The difference is that the mid-quartile range is an attempt to
address the problem of the range being heavily dependent on extreme
scores. An mid-quartile range represents the middle 50% of the scores in
the distribution.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Question: what is difference between raw data and grouped data,please explain it
with some example.
Answer: Raw data Data that have not been processed in any manner is called raw
data. It often refers to uncompressed text that is not stored in any priority
format. It may also refer to recently captured data that may have been
placed into a database structure, but not yet processed. Grouped data The
data presented in the form of frequency distribution is also known as
grouped data.
Question: Explain Nominal and ordinal levels of measurement and also tell me what
is EPAmileage rating.
Answer: Nominal Scales When measuring using a nominal scale, one simply
names or categorizes responses. The essential point about nominal scales
is that they do not imply any ordering. Nominal scales embody the lowest
level of measurement. It is used for identifying individuals, groups or
regions. Ordinal Scales Where nominal scales don't allow comparisons in
degree, this is possible with ordinal scales. Say you think it is better to
live in Karachi than in Lahore but you don't know by how much. EPA
means Environmental Protection Agency US government agency for the
protection of the environment which ranks the most fuel-efficient vehicle.
region and the second set of values is known as the rejection region for a
test. The rejection region is also called the critical region.
Question: Explain the use of word STATISTICS in singula & plural sense.
Answer: Latin words status, meaning a political state is believed to be the origin of
the word “statistics” Statistics: Today the word statistics is used in three
different meaning. Firstly, it is used in the sense of data for example price
statistics, death statistics etc Secondly, it is used as the plural of the word
“statistic” meaning the information obtained from the sample data.
Thirdly, it means the science of collecting, presenting, analyzing, and
interpreting the numerical facts obtained as a result of a survey.
Question: what is value of central tendecy? and why weapply it ?and how many
types of centeral tendency
Answer: Central Tendency means the tendency of the data to gather around some
central value and the value around which all the observations tend to
gather is called measure of central tendency. Measures of central
tendency of central tendency are generally known as Averages. The most
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Question: What is the relation between these two Moments & Moment Ratios?
Answer: Moments: A moment designates the power to which deviations are raised
before averaging them. Moment ratio: These are certain ratios in which
both numerators and the denominators are moments.
false claims.
Question: What is the difference between p(type 1 error) and p(type2 error)?
Answer: Type I error: On the basis of sample information, we may reject the null
hypothesis H0, when it is, in fact true. This type of error is called the type
I error. Type II error: On the basis of sample information we may accept
the null hypothesis H0, when it is actually false. This type of error is
called the type II error.
Question: explain Point Estimator and what does it mean by a good point estimator.
Answer: Point Estimator: A single value calculated from the sample that is likely
to be close in value to the unknown parameter. It is to be noted that a
point estimate will not, in general, be equal to the population parameter
as the random sample used is one of the many possible samples which
could be chosen from the population. Good Point Estimator: A point
estimator is considered a good estimator if it satisfies various criteria.
Four of these criteria are: (i) Unbiasedness (ii) Consistency (iii)
Efficiency (iv) Sufficiency
Answer: ONE-TAILED AND TWO-TAILED TESTS: A test, for which the entire
rejection region lies in only one of the two tails – either in the right tail or
in the left tail – of the sampling distribution of the test-statistic, is called a
one-tailed test or one-sided test. If, on the other hand, the rejection region
is divided equally between the two tails of the sampling distribution of
the test-statistic, the test is referred to as a two-tailed test or two-sided
test.
Question: What are the application of the and in which conditions for the use of
following tests? F-test chi square test z-test and t-test are not fulfilling
need
Answer: (i) F-test is used to compare the variances of two populations. (ii) Chi-
square test is used to test a specific value of population variance. (iii) Z-
test is used to test the mean of a population or equality of two population
means when population variance is known or sample size is greater than
30. (iv) t-test is used to test the mean of a population or equality of two
population means when population variance is unknown or sample size is
less than 30.