Lnformation Theory and Its Application To Analytical Chemistry
Lnformation Theory and Its Application To Analytical Chemistry
Lnformation Theory and Its Application To Analytical Chemistry
to Analytical Chemistry
D. E. Clegg
GriffithUniversity, Nathan, 4111, Queensland Australia
D. L. Massart
Free University Brussels, Brussels 1090, Belgium
The Changing Demands on the Analyst sages", where the term message is used in its widest sense.
The professional mission of analytical chemists is to s u p This range includes the faint molecular messages that em-
ply clients with information aboufthe composition of Sam- anate from the analyte in a sample, then pass through the
ples. The means for gaining this information has changed electronic encoding and decoding processes of a spe&om-
~rofoundlvover the vears.from the wmmon ~~-use
~-~ of
~- "&t"
-~
eter.
ehemical methods toihe almost complete reliance on phys-
ical instrumentation that is found in contemoorarv labora- Evaluating and Comparing Procedures
tories. Such changes have beenmade largeliin response to
the more stringent demands made by clients, for example, The reliability of information passing through such a
requests for the determination of very low levels of anal* system depends on both the ability of the sender (the ana-
or for fast multielement analysis. lyst) and the ability of the transmitter. The sender must
Inevitably, the shiR to instrumental methods has also re- correctly prepare the sample for encoding. The transmitter
quired analysts to augment their training in chemistry must convey the desired signal to the output stage. Also,
with an understanding of other fields such as electronics, this signal must be keot se~nratefrom the mvriad of other
~ ~~ - ~ -
optics, electromagnetism, and computing. However, the signals emanating &om 'the sample extract and from
need to develop knowledge and skills in these diverse dis- within the instrumentation.
ciplines should not be allowed to obscure the distinctive Shannon's singular contribution was to recognize that it
task of the analyst: to supply chemical information about was possible to describe and compare the performance of
samples. The emphasis should he kept on gathering reli- various message-transmission systems in a much more
able information, rather than being shifted to any particu- meaningful way. He defined the term "information" to
lar technique or methodology. allow the amount of information passing through a system
to he quantified.
The Evolving Concept of lnformation Workers in the relatively new field of "chemometrics"
It would be useful and timely for analysts to reconsider recognized the application of his theory to analysis. It
in greater depth what is meant by "information"in general could provide a measure of the performanee of an analyti-
and "chemical information" in particular. Then their com- cal procedure, expressed in terms of a common currencv.
plex and rapidly evolving methodologies could be put in namely the yieldbf "information". Widely different me&
broader teleological perspective. odologies could thus he evaluated and comoared bvmeans
Such matters have been the subject of much study since of thi; single yardstick (2).
a theory of information was developed by Claude Shannon Below we discuss how information is defined and how
in the late 1940's to solve some problems in message com- the basic ideas of information theory can be applied to an-
munication over the primitive Morse Code transmitters alytical chemistry--from spot tests and chromatography
then in use (1).The theory has been applied to a varied to the interpretation of large chemical data bases. For ped-
range of situations that involve the transmission of "mes- agogical reasons the examples are taken from qualitative
Outcome 0 (Fe is absent) 1(Fe is present) where I is the information contained in the answer, given
that there were n oossibilities: H is the initial uncertaintv
Probability V2 V2 resulting from the need to consider then possibilities; and
D is the orobabilitv ofeach outcome if all n oossibilities are
This describes the most basic level of uncertainty that is
possible: the choice between two equally likely cases. Such
a situation is assigned a unit value of uncertainty. Nonequal Probabilities
The value of the information contained in the outcome of The expression can be generalized to the situation in
this analysis is also equal to 1 if the uncertainty is com- which the probability of each outcome is not the same. If
pletely removed. This unit of information is called the we know from past experience that some elements are
"bit", as used in binary-coded systems. (Clearly, the situa- more likely to be present than others, eq 2 is adjusted so
tion with only one possibility has no uncertainty, and thus that the logarithms of the individual probabilities, suit-
can yield no information). ably weighted, are summed.
Then only one test, which must have two distinct obsew-
able states, is needed to complete this analysis. For exam-
ple, after oxidation, the addition of ferrocyanide reagent
either will or will not produce the characteristic color of
Prussian blue. Thus, an analytical task with 1unit of un- where
certainty can be completely resolved by a test that can pro-
vide 1unit of information.
Increasing the Possibilities Thus, we can consider again the original example, ex-
When up to two metals may be present in the sample cept that now past experience has shown that 90% of the
solution (e.g., Fe or Ni or both) there are four possible out- samples contained no iron. This situation is summarized
comes, ranging from neither being present to both being as follows.
present.
Outcome 0 (noFe) 1 (Fe)
Probability 0.9 0.1
generate information that reduces uncertainty about the This can be written as
composition of samples. A number of papers have de-
scribed this comparison for specific cases (3-6). Below we I = - @+ log, pi) - @- logzp 2
use thin-layer chromatography (TLC) as a conceptually
simple analytical situation that illustrates how the defmi-
tion is used to do this. where pf is the probability of finding an Rf between 0.19
and 0.25; andp- is the probability of fmding another value.
Calculations fora TLC Test of Many Species Consider not just two categories of results but n catego-
ries. For example, wnsider an Rfin the range (0-0.05) or
Consider the use of TLC to identify drugs that are used (0.05-0.10), etc. Then
therapeutically or in abuse. Suppose the compound to be
identified is known to be one of a library of no compounds
in which the probability of occurrence is the same for each.
Thus, the a priori probability for each is
Combining Solvents
In reality, this parity between sample uncertainty and
system information does not happen very oRen. We are
more likely to encounter the situation shown by solvent 2,
which gives us some information but not enough to guar-
antee identification. It fails to distinguish between the
pairs AB, CD, EF, and GH. It delivers logz 4 = 2 bits of
information, and we need 3 bits.
Solvent 3 can only separate the substances into two
groups (1 bit). However, when solvent 3 and solvent 2 are
combined, all eight substances can be identified sepa-
rately. Solvent 4 also gives only two spots, but combining
solvents 4 and 2 will not yield identification of all sub-
u 1 Rf (solvent 2)
stances.
Uncorrelated and conelated R, values and the effect on signal
Correlations
"Space".
When two systems are combined, simple addition of the If correlation occurs, the actual combinations become re-
information is possible only when the systems are un- stricted to a limited zone that becomes more narrow as p,*
correlated. That is, for systems A and B, approaches 1. In other words, the chance that two sub-
stances have the same analytical "signal" increases as this
zone narrows, thus reducing the information-generating
capabilities of the combined system. The example de-
where pas is the correlation coefficientbetween Aand B. scribed above is illustrated diagrammatically in the figure.
Correlation always reduces the signal "space" that is Classification Using Expert Systems
available to the substances present. Thus, in the above
case, a combination of two solvents produces a two-dimen- The application of information theory for so-called ma-
sional space with a potential array of 8 x 8 = 64 distinct chine learning was proposed by Quinlan (8).Machines can
signals, which is equivalent to a maximum of 6 bits of in- "learn" by so-called inductive reasoning, and inductive ex-
formation. Thus, the system can ideally recognize 64 dif- pert systems are now available to "teach" them. Before ex-
ferent Rf combinations. plaining how information theory plays a role in such ex-
pert systems, it may be necessary to clarify the difference
Rt Values between deductive and inductive expert systems.
A
The deductive expert systems are more usual and thus
0.10 0.20 0.20 0.20 better-known. Their knowledge base consists of rules that
B 0.20 0.20 0.40 0.20 have been entered by experts. The system uses these rules
C 0.30 0.40 0.20 0.20 by chaining them to reach a conclusion.
D 0.40 0.40 0.40 0.20 For example, we can reach the couclusion below by com-
E 0.50 0.60 0.20 0.40 bining the following two rules.
F 0.60 0.60 0.40 0.40 Ifa substance mntains more than 20 carbon atoms, then it
G 0.70 0.80 0.20 0.40 should be considered apolar.
H 0.80 0.80 0.40 0.40 If a substance is apolar, then it should be soluble in methanol.
Information 3 2 1 1 Thus, substance X,with 23 carbon atoms, should be soluble
(bits) in methanol.
fatty acids. In deciding whether a sample is a W or an E, because we know the distribution of oils: five samples
the required information (or the initial uncertainty before above the threshold and three below.
testing) will be the following. Thus, the information generated by test 1is the differ-
Hb =-@(W) logzp(W))- @(Ellogzp(E)) ence between the initial and final amounts of required in-
formation (or uncertainties).
If it is known that there are four unknowns from each
category, then the a priori probability is 0.5 for each cate-
gory Using eq 1,we conclude that Ha = 1.For other a priori
probabilities, other Hb values are obtained. Thus, we get
the following table.
Number Prob. Uncertainty This, of course, is very little. A test with this threshold
value is not very useful because it does not yield an appre-
4 W 4E (0.5 W 0.5 E) Hb= l bit ciable amount of information.
3 W 5E 10.315 W 0.625 E) Hb= 0.954 bit
2w 6~ i0.25 W 0.75 ~j Hb= 0.81 bit An Improved Test
OW8E (0 W 1 E) Hb=O bit
Then let us consider another test in which there are
again five positive and three negative samples. This time
These numbers can be understood, for example, if we all the negative samples are E, and the positive results are
know that the situation 0 W 8 E does indeed require no from 4 W and 1E samples. The information still required
information. We know that all samples are E. Thus, we do after the test is shown below.
not reqnire information to determine the origin of one of If the test result is positive, then calculating as above,
these samples. The situation 0.5 W 0.5 E is the most uncer-
tain. We reqnire more bits than in the other situations in H+= 0.70 bit
which the a priori knowledge is greater.
The Effect of Test Results If it is negative, we get
Let us now go a step further and see how the required
information (or initial uncertainty) is affected by a partic-
ular test result. For test 1,let us assume the worst situa-
tion, in which there is a 50:50 chance that the oil is from E because a negative test means that the sample can only be
or W. Thus, an E sample.
Thus, the average residual un&rtainty after performing
p=0.5 and H = l test 2 is
IF test 2 = (-) THEN sample = E (3) accessible cTitenonof performance often leads to analyti-
cal "overkill" tn cover any risk oferror in identification. In-
IF test 2 = (+) and IF test 6 = blue THEN sample = E (1) format~ontheory offersanalystsa way ofmoving moreeas-
IF test 2 = (+) and IF test 6 = red THEN sample = W (4) ily within this aualitative dimension oftheir work. It " mves ~~