Measurements and Metrics 30th August 2015

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Measurements and metrics

Science and measurement: Scientific progress is made through observations and generalizations based on
data and measurements. On the basis of the observations theories are proposed and the proposed theories
are confirmed or refuted via hypotheses testing on further optimized data. Thus it is the measurements and
data that really drive the progress of science and engineering. Without the empirical verifications through
data and measurements, theories and propositions will remain at abstract level. Business Analytics
professionals, therefore, must have a clear understanding about measurements particularly in the context of
business.
It is to be mentioned here that measurements are often made to quantify a concept. In some cases
particularly in the physical world the concepts are easier to grasp, whereas in artificial world like software
development defining the concept may not be that simple. For example the concepts of height, weight,
distance etc. are easily and uniformly understood (though may not be very easy to measure in all
circumstances) whereas concepts like knowledge or ease of doing business requires clear explanation. The
hierarchy from theory to hypothesis and from concept to measurement is given below:

Abstract World

Theory

Proposition

Empirical World

Concept

Definition

Hypotheses

Operational Definition

Data Analysis

Measurements in Real World

The building blocks of theory are concepts and definitions. In empirical work theories are often proposed in
terms of certain concepts. For example in marketing analytics theories may be proposed relating customer
experience and growth of business. In this case concepts like customer experience and business growth will
have to be defined and methodologies for measuring them must be proposed in order to establish or reject
the proposed theories.
Operational Definitions: The definition of concepts, to start with, is often rather abstract. Such definitions
do not facilitate measurements. For example we may define a circle to be a line forming a closed loop such
that the distance between any point on this line and a fixed interior point called the centre is constant. We
may further say that the fixed distance from the centre to any point on the closed loop gives the radius of
the circle. It may be noted that while the concept has been defined but no methodology to measure the
radius has been proposed. The operational definition specifies the methodology to carry out a specific
methodology.
The importance of operational definition can hardly be overstated. Take the case of an apparently simple
measurement of height of individuals. Unless we specify various factors like the time when the height will
be measured (it is known that the height of individuals vary across various time points of the day), how the
variability due to hair would be taken care of, whether the measurement will be with or without shoes, what
kind of accuracy is expected (correct up to an inch, inch, centimetre etc.) even this simple measurement
will lead to substantial variation. Software engineers must appreciate the need to define measures from an
operational perspective. W E Deming maintains that many of the problems of management are due to the
failure to operationally define the measures on the basis of which decisions are taken (Out of the Crisis).

Levels (scales) of measurement: Once the operational definitions are arrived at, the actual measurements
need to be undertaken. It is to be noted that measurement may be carried out in 4 different scales namely

Nominal
Ordinal
Interval
Ratio

Brief description of the measurement scales are given below.


Nominal scale: This is the lowest level of measurement and represents the most unrestricted assignment of
numerals. The numerals serve only as labels and words or letters would serve as well. The nominal scale of
measurement involves only classification and the observed sampling units are put into any one of the
mutually exclusive and collectively exhaustive categories (classes). Some examples of nominal scales of
measurement in the context of business analytics are

The state of domicile of an employee


The programming languages like C, C++, Visual Basic etc. used in a project

In nominal scale the names of the different categories are just labels and no relationship between them is
assumed. The only operation that can be carried out on nominal scale is that of counting the number of
occurrences in the different classes. However statistical analyses may be carried out to understand how
entities belonging to different classes perform with respect to some other response variable. Thus we may
try to find the impact of programming language chosen on productivity or occurrence of defects.
Ordinal scale: Refers to the measurement scale where the different values obtained through the process of
measurement have an implicit ordering. Typical examples of measurements in ordinal scales are

Socio economic status of individuals


Level of education primary, secondary level, graduate, professional qualifiction etc.

Measurement in ordinal scale satisfies the transitivity property in the sense that if A > B and B > C then A >
C. However, arithmetic operations cannot be carried out on variables measured in ordinal scales. Thus if we
measure customer satisfaction on a 5-point ordinal scale of 5 implying very high level of satisfaction and 1
implying very high level of dissatisfaction, we cannot find the average satisfaction by taking the average of
the scores of different customers. However, we can find the median as computation of median involves
counting only.
Interval scales: With the interval scale we come to a form that is quantitative in the ordinary sense of the
word. Almost all the usual statistical measures are applicable here, unless they require a knowledge of a
true zero point. The zero point on an interval scale is a matter of convention. As the zero point is not
included, ratios do not make sense but difference between levels of attributes can be computed and is
meaningful. Some examples of interval scale of measurement are

Measurement of temperature in different scales like Centigrade and Fahrenheit is a common


example. Suppose T1 and T2 are temperatures measured in some scale. We note that the fact that T 1
is twice T2 does not mean that one object is twice as hot as another. We also note that the zero
points are arbitrary.
Calendar dates is another example of measurement in interval scale. While the difference between
the dates to measure the time elapsed is a meaningful concept, the ratio does not make sense.
Many psychological measurements aspire to create interval scales. Intelligence is often measured
in interval scale as it is not necessary to define what zero intelligence would mean.

If a variable is measured in interval scale, most of the usual statistical analyses like mean, standard
deviation, correlation and regression may be carried out on the measured values.
Ratio scale: These are quite commonly encountered in physics. These scales of measures are characterized
by the fact that operations exist for determining all 4 relations namely equality, rank order, equality of
intervals and equality of ratios. Once such a scale is available, its numerical values can be transformed from

one unit to another by just multiplying by a constant, e.g. conversion of inch to feet or centimetre. When
measurements are being made in ratio scale, existence of a non-arbitrary zero is mandatory. All statistical
measures are applicable to ratio scale and only with these scales usage of logarithms is valid like in the case
of decibels.
The characteristics of the 4 different scales are captured in brief in the table given below:
Scale
Nominal

Basic Empirical
Operations
Determination of equality

Mathematical Group
Structure
Permutation Group y =
f(x) where f(x) means
any
one-to-one
substitution

Permissible Statistics

Ordinal

Interval

Determination of greater
or less

Determination of equality
of intervals of differences

Isotonic group y = f(x)


where f(x) means any
monotonic
increasing
function
General linear group y =
a + bx

Ratio

Determination of equality
of ratios

Similarity group y = bx

Number of cases
Mode
Contingency
tables
Regression
analyses
with
indicator
variables
All of the above,
and
Median
Percentiles
All of the above,
and
Mean
Standard
deviation
Product moment
correlation
All of the above,
and
Coefficient
of
variation

Concept of latent variables:


In many practical situations we attempt to measure a concept using a set of observable variables. For
example, we might want to measure complexity of a software product and may use a number of static code
measures. In other situations we might want to measure the level of satisfaction of customers through
response to a number of questions. In yet other situations we may attempt to measure the level of
knowledge of a programmer through marks obtained in a test. In each of these cases we are attempting to
measure an abstract concept through a number of variables and it is assumed that the chosen variables do
measure the concept. The underlying concept (variable) that we are attempting to measure is called the
latent variable.
In software engineering the usage of latent variables is fairly common and hence a software engineer needs
to have a clear understanding regarding the instruments used to measure the latent variable and their
validity.
Reliability and validity:
A basic question to be asked for any measurement system is whether the proposed measure is truly
measuring the concept in a reliable manner. Reliability and validity are the two most important criteria to
address this question.

Reliability:
The reliability of a measure is the extent to which the measurement yields consistent results. Essentially,
reliability refers to the consistency of the values obtained when the same item is measured a number of
times. When the results agree with each other, the measurement is said to be reliable.
Reliability usually depends on the operational definition. A possible way to quantify reliability is to use the
index of variation expressed as

Standard Deviation
.
Mean

When a questionnaire is used as the instrument for measuring some abstract concept, the validity of the
instrument may be assessed using Cronbachs .
Validity:
Validity refers to whether the measurement or metric really measure what we intend to measure. Validity of
a measurement may be looked at from three different perspectives, namely construct or representational
validity, criteria validity and content validity. Each of these explained below.

Representational validity: This is applicable when we are measuring a physical attribute and
not a latent variable or a concept (?). A measure is said to be representationally valid in case it
truly measures the physical attribute that it proposes to measure. For instance we may use
marks as a measure of knowledge or IQ score as a measure of intelligence or GDP growth as
the measure of economic development of a nation. However, these measures may be
questionable as they may not measure the underlying physical attribute that we propose to
measure.

Construct validity: Construct validity embraces a variety of techniques for assessing the
degree to which an instrument (proposed measure) measures the concept or the attribute that it
was designed to measure. In case we are attempting to measure a latent variable it is
concerned with testing dimensionality (is the assumption that there is a single latent variable
supported by the evidence?), testing homogeneity (do all the items appear to be tapping into
the same latent variable?) and, if there is more than one latent variable, testing the extent to
which they overlap (do some items from one subscale correlate with other latent variables?).
When measuring a physical attribute, the construct validity is linked to accuracy the ability
to measure close to the true value.

Content validity: Content validity refers to the extent to which a measure covers the range of
meanings included within the concept. In the case of measurement of abstract concepts
content validity is concerned with the extent to which the items comprising the measure cover
all aspects of the latent variable and no additional features. In educational tests, for example, it
would be unreasonable to include items that are not related to the syllabus on which the
students are being examined, and to do so would result in a test which is not assessing what it
purports to assess. At the same time, the test should cover a wide range of relevant aspects
from the syllabus, since otherwise there could be undetected differences in knowledge or
ability between the students. This is more important for causal variables than for indicator
variables. Similarly, while measuring areas like usability of a product, content validity might
require introducing a number of different metrics.

Criterion validity: The criteria validity is also referred to as predictive validity. Criteria
validity refers to the ability of the measure to predict related concepts.

Assessing Reliability:

In the literature four main methods have been proposed for assessing reliability (Carmines F G and R A
Zeller, Reliability and Validity Assessment, Beverly Hills, Calif, Sage Publications, 1979), namely
a.
b.
c.
d.

Test-retest method
Alternative form method
Split-halves method
Internal consistency method

In this section we discuss the test-retest method briefly.


In the test-retest method we simply take a second measurement of the subject. The correlation between the
first and the second measure gives the reliability of the measurement. Suppose the two measurements of the
same subject are M1 and M2 respectively.
Suppose the true value of the attribute being measured is T. Then M 1 = T + e1 and M2 = T + e2 where e1 and
e2 are random error terms with mean 0. In this case the correlation between M1 and M2 is given by
M1,M2 = Cov(M1, M2) / (M1* M2) = V(T) / V(M)
Concepts of accuracy and precision:
In traditional quality engineering where measurements are mostly physical and do not need to measure
abstract concepts, reliability is termed as precision. Again in the cases when abstract concept is not being
measured, the construct validity is intimately linked with the concept of accuracy or how close is the
measure to the true value. Thus accuracy and precision may be defined as follows:
Accuracy refers to the degree of closeness of measurements of a quantity to the actual value of the quantity.
Precision refers to the degree to which repeated measurements carried out under similar conditions agree to
each other. The concepts of accuracy and precision are explained through a diagram given below (taken
from Wikipedia)

Concepts of repeatability and reproducibility:


The degree of difference between different measurements carried out by the same operator over a short time
interval is called the repeatability. When measurements are carried out by different operators under more or
less similar condition, the degree of difference is called the reproducibility. Both repeatability and
reproducibility can be estimated statistically and needs to be estimated as these values enable the engineer
to understand the reliability of the measurement. The usual conditions for repeatability, reproducibility and
precision when measurements are made in the same laboratory using the same instrument are given below:

Table 1 Conditions for Repeatability, Reproducibility and Precision

Repeatability
Condition

Intermediate Precision Condition

Reproducibility
Condition

Laboratory

Same

Same

Different

Operator

Same

Different

Different

Apparatus

Same

Samea

Different

Time between Tests

Shortb

Multiple Days

Not Specified

This situation can be different instruments meeting the same design requirement.

Standard test method dependent, typically does not exceed one day

You might also like