Measurements and Metrics 30th August 2015
Measurements and Metrics 30th August 2015
Measurements and Metrics 30th August 2015
Science and measurement: Scientific progress is made through observations and generalizations based on
data and measurements. On the basis of the observations theories are proposed and the proposed theories
are confirmed or refuted via hypotheses testing on further optimized data. Thus it is the measurements and
data that really drive the progress of science and engineering. Without the empirical verifications through
data and measurements, theories and propositions will remain at abstract level. Business Analytics
professionals, therefore, must have a clear understanding about measurements particularly in the context of
business.
It is to be mentioned here that measurements are often made to quantify a concept. In some cases
particularly in the physical world the concepts are easier to grasp, whereas in artificial world like software
development defining the concept may not be that simple. For example the concepts of height, weight,
distance etc. are easily and uniformly understood (though may not be very easy to measure in all
circumstances) whereas concepts like knowledge or ease of doing business requires clear explanation. The
hierarchy from theory to hypothesis and from concept to measurement is given below:
Abstract World
Theory
Proposition
Empirical World
Concept
Definition
Hypotheses
Operational Definition
Data Analysis
The building blocks of theory are concepts and definitions. In empirical work theories are often proposed in
terms of certain concepts. For example in marketing analytics theories may be proposed relating customer
experience and growth of business. In this case concepts like customer experience and business growth will
have to be defined and methodologies for measuring them must be proposed in order to establish or reject
the proposed theories.
Operational Definitions: The definition of concepts, to start with, is often rather abstract. Such definitions
do not facilitate measurements. For example we may define a circle to be a line forming a closed loop such
that the distance between any point on this line and a fixed interior point called the centre is constant. We
may further say that the fixed distance from the centre to any point on the closed loop gives the radius of
the circle. It may be noted that while the concept has been defined but no methodology to measure the
radius has been proposed. The operational definition specifies the methodology to carry out a specific
methodology.
The importance of operational definition can hardly be overstated. Take the case of an apparently simple
measurement of height of individuals. Unless we specify various factors like the time when the height will
be measured (it is known that the height of individuals vary across various time points of the day), how the
variability due to hair would be taken care of, whether the measurement will be with or without shoes, what
kind of accuracy is expected (correct up to an inch, inch, centimetre etc.) even this simple measurement
will lead to substantial variation. Software engineers must appreciate the need to define measures from an
operational perspective. W E Deming maintains that many of the problems of management are due to the
failure to operationally define the measures on the basis of which decisions are taken (Out of the Crisis).
Levels (scales) of measurement: Once the operational definitions are arrived at, the actual measurements
need to be undertaken. It is to be noted that measurement may be carried out in 4 different scales namely
Nominal
Ordinal
Interval
Ratio
In nominal scale the names of the different categories are just labels and no relationship between them is
assumed. The only operation that can be carried out on nominal scale is that of counting the number of
occurrences in the different classes. However statistical analyses may be carried out to understand how
entities belonging to different classes perform with respect to some other response variable. Thus we may
try to find the impact of programming language chosen on productivity or occurrence of defects.
Ordinal scale: Refers to the measurement scale where the different values obtained through the process of
measurement have an implicit ordering. Typical examples of measurements in ordinal scales are
Measurement in ordinal scale satisfies the transitivity property in the sense that if A > B and B > C then A >
C. However, arithmetic operations cannot be carried out on variables measured in ordinal scales. Thus if we
measure customer satisfaction on a 5-point ordinal scale of 5 implying very high level of satisfaction and 1
implying very high level of dissatisfaction, we cannot find the average satisfaction by taking the average of
the scores of different customers. However, we can find the median as computation of median involves
counting only.
Interval scales: With the interval scale we come to a form that is quantitative in the ordinary sense of the
word. Almost all the usual statistical measures are applicable here, unless they require a knowledge of a
true zero point. The zero point on an interval scale is a matter of convention. As the zero point is not
included, ratios do not make sense but difference between levels of attributes can be computed and is
meaningful. Some examples of interval scale of measurement are
If a variable is measured in interval scale, most of the usual statistical analyses like mean, standard
deviation, correlation and regression may be carried out on the measured values.
Ratio scale: These are quite commonly encountered in physics. These scales of measures are characterized
by the fact that operations exist for determining all 4 relations namely equality, rank order, equality of
intervals and equality of ratios. Once such a scale is available, its numerical values can be transformed from
one unit to another by just multiplying by a constant, e.g. conversion of inch to feet or centimetre. When
measurements are being made in ratio scale, existence of a non-arbitrary zero is mandatory. All statistical
measures are applicable to ratio scale and only with these scales usage of logarithms is valid like in the case
of decibels.
The characteristics of the 4 different scales are captured in brief in the table given below:
Scale
Nominal
Basic Empirical
Operations
Determination of equality
Mathematical Group
Structure
Permutation Group y =
f(x) where f(x) means
any
one-to-one
substitution
Permissible Statistics
Ordinal
Interval
Determination of greater
or less
Determination of equality
of intervals of differences
Ratio
Determination of equality
of ratios
Similarity group y = bx
Number of cases
Mode
Contingency
tables
Regression
analyses
with
indicator
variables
All of the above,
and
Median
Percentiles
All of the above,
and
Mean
Standard
deviation
Product moment
correlation
All of the above,
and
Coefficient
of
variation
Reliability:
The reliability of a measure is the extent to which the measurement yields consistent results. Essentially,
reliability refers to the consistency of the values obtained when the same item is measured a number of
times. When the results agree with each other, the measurement is said to be reliable.
Reliability usually depends on the operational definition. A possible way to quantify reliability is to use the
index of variation expressed as
Standard Deviation
.
Mean
When a questionnaire is used as the instrument for measuring some abstract concept, the validity of the
instrument may be assessed using Cronbachs .
Validity:
Validity refers to whether the measurement or metric really measure what we intend to measure. Validity of
a measurement may be looked at from three different perspectives, namely construct or representational
validity, criteria validity and content validity. Each of these explained below.
Representational validity: This is applicable when we are measuring a physical attribute and
not a latent variable or a concept (?). A measure is said to be representationally valid in case it
truly measures the physical attribute that it proposes to measure. For instance we may use
marks as a measure of knowledge or IQ score as a measure of intelligence or GDP growth as
the measure of economic development of a nation. However, these measures may be
questionable as they may not measure the underlying physical attribute that we propose to
measure.
Construct validity: Construct validity embraces a variety of techniques for assessing the
degree to which an instrument (proposed measure) measures the concept or the attribute that it
was designed to measure. In case we are attempting to measure a latent variable it is
concerned with testing dimensionality (is the assumption that there is a single latent variable
supported by the evidence?), testing homogeneity (do all the items appear to be tapping into
the same latent variable?) and, if there is more than one latent variable, testing the extent to
which they overlap (do some items from one subscale correlate with other latent variables?).
When measuring a physical attribute, the construct validity is linked to accuracy the ability
to measure close to the true value.
Content validity: Content validity refers to the extent to which a measure covers the range of
meanings included within the concept. In the case of measurement of abstract concepts
content validity is concerned with the extent to which the items comprising the measure cover
all aspects of the latent variable and no additional features. In educational tests, for example, it
would be unreasonable to include items that are not related to the syllabus on which the
students are being examined, and to do so would result in a test which is not assessing what it
purports to assess. At the same time, the test should cover a wide range of relevant aspects
from the syllabus, since otherwise there could be undetected differences in knowledge or
ability between the students. This is more important for causal variables than for indicator
variables. Similarly, while measuring areas like usability of a product, content validity might
require introducing a number of different metrics.
Criterion validity: The criteria validity is also referred to as predictive validity. Criteria
validity refers to the ability of the measure to predict related concepts.
Assessing Reliability:
In the literature four main methods have been proposed for assessing reliability (Carmines F G and R A
Zeller, Reliability and Validity Assessment, Beverly Hills, Calif, Sage Publications, 1979), namely
a.
b.
c.
d.
Test-retest method
Alternative form method
Split-halves method
Internal consistency method
Repeatability
Condition
Reproducibility
Condition
Laboratory
Same
Same
Different
Operator
Same
Different
Different
Apparatus
Same
Samea
Different
Shortb
Multiple Days
Not Specified
This situation can be different instruments meeting the same design requirement.
Standard test method dependent, typically does not exceed one day