Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Unit I: Introduction to Statistics:

1.1 - Meaning and definition:

Statistics is an important decision-making tool in business and is used in virtually every area of business.
In this course, the word statistics is defined as the science of gathering, analyzing, interpreting, and
presenting data.

1.2- Functions:

The study of statistics can be subdivided into two main areas: descriptive statistics and inferential
statistics. Descriptive statistics result from gathering data from a body, group, or population and reaching
conclusions only about that group. Inferential statistics are generated from the process of gathering
sample data from a group, body, or population and reaching conclusions about the larger group from
which the sample was drawn.

The appropriate type of statistical analysis depends on the level of data measurement, which can be (1)
nominal, (2) ordinal, (3) interval, or (4) ratio. Nominal is the lowest level, representing classification of
only such data as geographic location, sex, or Social Security number. The next level is ordinal, which
provides rank ordering measurements in which the intervals between consecutive numbers do not
necessarily represent equal distances. Interval is the next to highest level of data measurement in which
the distances represented by consecutive numbers are equal. The highest level of data measurement is
ratio, which has all the qualities of interval measurement, but ratio data contain an absolute zero and
ratios between numbers are meaningful. Interval and ratio data sometimes are called metric or
quantitative data. Nominal and ordinal data sometimes are called nonmetric or qualitative data.

Two major types of inferential statistics are (1) parametric statistics and (2) nonparametric statistics. Use
of parametric statistics requires interval or ratio data and certain assumptions about the distribution of
the data. The techniques presented in this text are largely parametric. If data are only nominal or ordinal
in level, nonparametric statistics must be used.

1.3 - Scope and limitations:

The scope of application of statistics has assumed unprecedented dimensions these days. Statistical
methods are applicable in diverse fields such as economics, trade, industry, commerce, agriculture, bio-
sciences, physical sciences, education, astronomy, insurance, accountancy and auditing, sociology,
psychology, meteorology, and so on.

Although statistics has its applications in almost all sciences—social, physical, and natural—it has its own
limitations as well, which restrict its scope and utility.

Statistics Does Not Study Qualitative Phenomena Since statistics deals with numerical data, it cannot be
applied in studying those problems which can be stated and expressed quantitatively. For example, a
statement like ‘Export volume of India has increased considerably during the last few years’ cannot be
analysed statistically. Also, qualitative characteristics such as honesty, poverty, welfare, beauty, or health,
cannot directly be measured quantitatively. However, these subjective concepts can be related in an
indirect manner to numerical data after assigning particular scores or quantitative standards. For
example, attributes of intelligence in a class of students can be studied on the basis of their Intelligence
Quotients (IQ) which is considered as a quantitative measure of the intelligence.
Statistics Does Not Study Individuals According to Horace Secrist ‘By statistics we mean aggregate of
facts affected to a marked extent by multiplicity of factors . . . and placed in relation to each other.’ This
statement implies that a single or isolated figure cannot be considered as statistics, unless it is part of
the aggregate of facts relating to any particular field of enquiry. For example, price of a single commodity
or increase or decrease in the share price of a particular company does not constitute statistics.
However, the aggregate of figures representing prices, production, sales volume, and profits over a
period of time or for different places do constitute statistics.

Statistics Can Be Misused Statistics are liable to be misused. For proper use of statistics one should have
enough skill and experience to draw accurate and sensible conclusions. Further, valid results cannot be
drawn from the use of statistics unless one has a proper understanding of the subject to which it is
applied. The greatest danger of statistics lies in its use by those who do not possess sufficient experience
and ability to analyse and interpret statistical data and draw sensible conclusions. Bowley was right when
he said that ‘statistics only furnishes a tool though imperfect which is dangerous in the hands of those
who do not know its use and deficiencies.’ For example, the conclusion that smoking causes lung cancer,
since 90 per cent of people who smoke die before the age of 70 years, is statistically invalid because here
nothing has been mentioned about the percentage of people who do not smoke and die before reaching
the age of 70 years. According to W. I. King, ‘statistics are like clay of which you can make a God or a Devil
as you please.’ He also remarked, ‘science of statistics is the useful servant but only of great value to
those who understand its proper use.’

1.4 - Collection and presentation of data:

Two types of graphical depictions are quantitative data graphs and qualitative data graphs. Quantitative
data graphs presented in this chapter are histogram, frequency polygon, ogive, dot plot, and stem-and-
leaf plot. Qualitative data graphs presented are pie chart, bar chart, and Pareto chart. In addition, two-
dimensional scatter plots are presented. A histogram is a vertical bar chart in which a line segment
connects class endpoints at the value of the frequency. Two vertical lines connect this line segment
down to the x-axis, forming a rectangle. A frequency polygon is constructed by plotting a dot at the
midpoint of each class interval for the value of each frequency and then connecting the dots. Ogives are
cumulative frequency polygons. Points on an ogive are plotted at the class endpoints. A dot plot is a
graph that displays frequency counts for various data points as dots graphed above the data point. Dot
plots are especially useful for observing the overall shape of the distribution and determining both gaps
in the data and high concentrations of data. Stem-and-leaf plots are another way to organize data. The
numbers are divided into two parts, a stem and a leaf. The stems are the leftmost digits of the numbers
and the leaves are the rightmost digits. The stems are listed individually, with all leaf values
corresponding to each stem displayed beside that stem.

A pie chart is a circular depiction of data. The amount of each category is represented as a slice of the
pie proportionate to the total. The researcher is cautioned in using pie charts because it is sometimes
difficult to differentiate the relative sizes of the slices.
The bar chart or bar graph uses bars to represent the frequencies of various categories. The bar chart
can be displayed horizontally or vertically.

A Pareto chart is a vertical bar chart that is used in total quality management to graphically display the
causes of problems. The Pareto chart presents problem causes in descending order to assist the decision
maker in prioritizing problem causes. The scatter plot is a two-dimensional plot of pairs of points from
two numerical variables. It is used to graphically determine whether any apparent relationship exists
between the two variables.

1.5 - Frequency distribution:

The two types of data are grouped and ungrouped. Grouped data are data organized into a frequency
distribution. Differentiating between grouped and ungrouped data is important, because statistical
operations on the two types are computed differently.

Constructing a frequency distribution involves several steps. The first step is to determine the range of
the data, which is the difference between the largest value and the smallest value. Next, the number of
classes is determined, which is an arbitrary choice of the researcher. However, too few classes over
aggregate the data into meaningless categories, and too many classes do not summarize the data
enough to be useful. The third step in constructing the frequency distribution is to determine the width
of the class interval. Dividing the range of values by the number of classes yields the approximate width
of the class interval.

The class midpoint is the midpoint of a class interval. It is the average of the class endpoints and
represents the halfway point of the class interval. Relative frequency is a value computed by dividing an
individual frequency by the sum of the frequencies. Relative frequency represents the proportion of total
values that is in a given class interval. The cumulative frequency is a running total frequency tally that
starts with the first frequency value and adds each ensuing frequency to the total.

You might also like