Linear Mixture Modelling Solution Methods For Satellite Remote Sensing
Linear Mixture Modelling Solution Methods For Satellite Remote Sensing
Linear Mixture Modelling Solution Methods For Satellite Remote Sensing
Stephen Hobbs
College of Aeronautics, Cranfield University
Cranfield, Bedford MK43 0AL, UK
[email protected]
27 March 1996, revised 4 Nov 2003
Abstract
This report documents the algorithms used in the program MIXMOD to analyse mixed pixel data (assuming linear mixing). The report describes the mathematical algorithms rather than acting as a users manual for MIXMOD. The
algorithms described obtain the desired solutions, quantify the quality of the solution, and estimate error bounds, using a variety of methods. A novel feature
is the ability to handle uncertainty in the assumed end-member spectra, which
in practice may be the dominant source of error. The report includes a brief
literature review to place the work in its broader context (with references to the
mathematics of linear systems and other applications of mixture modelling in
remote sensing).
Contents
1 Introduction
1.1 Linear mixture modelling . . . . . . . . . . . . . . . . . . . . . .
1.2 Related analysis techniques and applications . . . . . . . . . . . .
1.3 Structure of the report . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
2
Literature Overview
2.1 Mathematical Methods . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Comparative Studies . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
5
6
6
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
8
8
8
8
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
12
12
13
13
13
14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
15
17
17
18
.
.
.
.
.
.
.
19
8 Conclusions
23
Bibliography
25
ii
Chapter 1
Introduction
The principal aim of this report is to document the algorithms used in the
program MIXMOD for the analysis of mixed pixel data. Subsidiary aims are
to summarise the principles on which linear mixture modelling are based and
to provide a brief overview of the literature relevant to the subject.
Linear mixture modelling refers to cases in remote sensing in which the
signal measured is, for practical purposes, a linear mixture of several different
fundamental signals (usually called end-members in the literature). The
goal is often to estimate the relative contributions of the different end-member
components given only measurements of the composite (or mixed) signals for
a scene.
The report lists references to applications and discussions of the methods
specifically in the area of satellite remote sensing. References are also given to
the much larger body of literature concerned with general linear systems and to
some related applications of the techniques.
Program MIXMOD which is used at Cranfield University for the analysis
of mixed pixel data uses several different solution algorithms; it also evaluates
parameters to quantify the goodness-of-fit of the solution, and calculates appropriate error bounds. All the methods currently used for these functions are
documented in this report, which therefore serves as both a reference for MIXMOD and also as a (partial) review of the mathematical methods appropriate
to linear mixture modelling.
1.1
The general case being considered is that in which the outputs of some linear
combination process are measurable, but in which the parameters of interest are
actually the inputs. In satellite remote sensing, this occurs when the elemental
area of the Earths surface being imaged (a pixel) contains several different
fundamental cover components (e.g. water, bare soil, grass, forest) and so the
measured spectrum is a composite of the spectra from the different individual
components. If the sensors resolution limit is greater than the pixel size then
neighbouring pixels are not independent measurements of surface properties.
In many cases of interest, the composite spectrum is a linear combination of
the spectra of the individual components, and it is the areas covered by the
= Ax
(1.1)
In this case, the inputs are the class proportions and the outputs are the
signals in each band. In general, the number of bands does not equal the number
of cover classes, and the matrix equation cannot be inverted directly since A
is not square. There are usually more bands than classes, and a conventional
least squares matrix solution method is used (which may or may not account for
uncertainty in the measurements). An important feature of the various leastsquares methods is their ability to cope with a degree of noise in the system.
Conventionally, this noise is assumed to lie solely in the signal measurements
made in each spectral band. Methods reported in the literature do not yet
include the ability to deal with noise in the end-member reference spectra which
is likely to be the dominant source of noise in most applications. Appropriate
solution methods are presented in sections 5 and 6 of this report, and are also
discussed in [7].
Practical methods of estimating the pixel proportions take many forms, and
often make use of additional information to try to improve the solution estimate. The sophistication of solution methods also varies widely, from relatively
straightforward inversion methods for a fixed number of bands (e.g. [25]) to
much more general algorithms able to incorporate a wide range of additional
information (e.g. [24]).
1.2
Many fields of study use linear models to relate inputs and outputs, and the
mathematics of such systems is a highly developed subject. [15] discuss the
mathematics of solution methods for linear systems in good detail, and refer
to related topics such as fitting models to data, and regularised solutions (see
below).
In remote sensing, a mature field in terms of the algorithms used is atmospheric profiling. Rodgers [19, 9, 20] authoritatively reviews the inversion
methods used in this field, covering linear methods as well as more sophisticated techniques.
1.3
Chapter 2
Literature Overview
The references listed with this report are a sample of the literature relevant to
the subject of linear mixture modelling. The articles generally deal with either
the mathematical principles and relevant algorithms, specific applications, or
with comparative studies for several different approaches to the same problem.
The following notes are included to guide the reader to appropriate articles for
specific topics.
2.1
Mathematical Methods
[15] give an excellent guide to the use of mathematical methods for data analysis. Topics covered include linear systems (Chapter 2), fitting models to data
(Chapter 15), quantifying solution quality and errors (sections 15.1 and 15.6 of
Chapter 15), and regularising solutions (sections 18.4ff of Chapter 18). Papers
covering the application of these methods to remote sensing include [24] and the
work of Rodgers [19, 9, 20].
2.2
Applications
One of the most active groups using mixture modelling techniques is that based
at the University of Washington (Seattle, USA) [26, 1, 18, 27, 21, 25, 22, 28].
Their work, principally for land surfaces includes hyperspectral geological applications, and has spawned projects at related institutions [8, 14]. Monte Carlo
methods have been used to try to analyse the ability of measurements to discriminate surface types and to evaluate error bounds (e.g. [22]), but the approach
is largely empirical, without always justifying underlying assumptions (e.g. the
use of shade as an end-member, or the application of the method to water colour
due to sediment loading).
Perhaps the most comprehensive work, mathematically, (although it does
not deal with end-member uncertainty) is that of [24]. Topics such as the
regularisation of solutions are discussed. Non-mathematical readers may find
the work daunting.
Various case studies have been reported [3, 5, 16, 17, 2, 12, 30, 29]. The work
of [31] makes an interesting link between mixture modelling and data fusion.
Outside the area of surface remote sensing, Rodgers [19, 9, 20] gives authoritative reviews of atmospheric profiling, including the use of linear mixing
models.
2.3
Comparative Studies
Several studies have been carried out comparing mixture modelling with alternative techniques such as artificial neural networks and fuzzy logic [4, 6, 2].
Neural networks, when properly trained, perform well, but do not provide clear
measures of solution quality or likely error bounds.
Chapter 3
The linear mixing model is assumed in most work related to the problem of
estimating the proportions of different cover components occupying a single
pixel from the spectral signature of that pixel. This model is equivalent to
assuming that radiation scattered from the pixel is scattered by one component
only, i.e. photons only interact with one cover type. In general this is often a
good assumption and leads to a number of applications.
Note however, that several developments of this method need careful thought:
1. The signal must be linearly related to the component proportions. This
usually requires it to be linearly related to the power received by the
detector. Quantities such as radiance, DN (usually) and reflectance are
suitable parameters, but in general, quantities such as the radar backscatter intensity ( 0 ) in dB and the brightness temperature (TB ) are not. For
small dynamic ranges quantities which are strictly non-linear may be close
enough to linear to be dealt with by linear mixture modelling.
2. Signals produced by coherent scattering (e.g. radar, lidar) do not in general
satisfy the required linear dependence on component proportion.
3. Shade: shade is inherently a product of multiple scattering and is therefore
unlikely in general to be measurable by this technique.
4. Variable strength: the linear method can solve for either variable areas of
fixed spectra (the usual land-cover application), or variable strengths of
given spectra for fixed areas. The two effects cannot however be separated.
3.2
It should be noted that constraints can be applied in different ways. Constraints can be applied so that they are satisfied exactly. For example, the
equations can be rewritten to ensure that the sum of components is exactly
equal to 1 (Sections 4, 5; the number of classes to solve for is then reduced by
1). Alternatively, the constraint can be applied on the same basis as the agreement between the data and modelled signals is required, i.e. small departures
from exactly satisifying the constraint are allowed (an example of this is referred
to below as the Lagrange multiplier method).
3.2.1
There are two principal constraints which are applied to individual pixels:
1. No proportion value should be less than 0.
2. The sum of proportions should be equal to 1.
The first of these is usually applied iteratively (as is an option in MIXMOD)
while the second can be applied either exactly (by modifying the basic linear
mixing model equations) or approximately (treating the constraint on the same
basis as the data). A third constraint, that no proportion value should be greater
than 1, is implicit if both the above constraints are applied, and is rarely applied
explicitly.
3.2.2
Constraints which apply to groups of pixels, usually groups of contiguous pixels, are described by the general term regularisation, and are discussed by [24]
and [15]. Regularisation methods allow a priori information concerning spatial
properties of the solution (e.g. the size of inter-pixel variations, or the (presumably low) likelihood of large departures from an area mean) to be incorporated
into the solution.
3.3
Any serious attempt to solve for the pixel proportions must address the issue of
checking the quality of the solution obtained. This is usually done by comparing
the discrepancies between the modelled and measured signals in each band with
the level of discrepancy which can be explained using known uncertainties in the
measurement system. Several methods of making the comparisons are available.
Residuals (the actual differences between the measured and modelled signals)
are used by some authors (e.g. [22]), while others use quasi or fully quantitative
statistical measures based on the conventional 2 goodness-of-fit parameter (see
Section 6 of this report). However this task is done, it is a crucial step.
3.4
If a solution is of acceptable quality, the next step to make the solution estimate
usable is to estimate the level of uncertainty of the solution. If there is no
knowledge of the uncertainty of the solution then the solution is useless in
8
Constraint(s)
applied
None
P
xj = 1
No end-member
uncertainty
Pnc1
x
Pj 0 (a)
xj = 1 & xj 0
OK
OK
ok
OK
OK
End-member uncertainty
treatment method:
Approximate
Exact
OK
OK
OK
OK
ok
ok
OK
OK
OK
OK
Table 3.1: Different solution methods for linear mixture modelling. (OK indicates that this case has been implemented fully in program MIXMOD (current
version = 0.12), and ok that a partial implementation is available, i.e. excluding dedicated error analysis.) The various solution methods referred to are
explained in the text.
practice. The main quantitative approaches to estimating error bounds are (1)
a Monte Carlo method, and (2) an analytical method. The Monte Carlo method
simulates several different sets of measurements with different sets of random
noise added, and then looks at the spread of solutions obtained. The analytical
method uses the curvature of the cost function minimum at the solution to
estimate error bounds and covariances. Both methods assume that the noise
distributions are known. The Monte Carlo method can deal more easily with
general noise models while the analytical method is more effective at revealing
the factors which make the measurements susceptible to noise.
3.5
10
Chapter 4
G =
yi
nb
2
X
(yi yi )
i=1
nc
X
i2
aij xj
(4.1)
(4.2)
j=1
where aij is the array of end-member reference spectra, and xj is the proportion of component j contributing to the (modelled) signal.
Depending on the definition of i , different types of solution (including those
used in most published work) are obtained.
1. i is constant, independent of signal band (i.e. all bands are weighted
equally): the method repeats the standard least squares solution.
2. i varies with signal band, usually according to the level of noise present
with the signal, but only the relative weights are controlled: gives a
weighted least squares solution but the goodness-of-fit parameter is not
quantitative.
3. i is set equal to the expected noise standard deviation in each band,
and the noise is close to Gaussian: a weighted least-squares solution is
obtained together with a quantitative measure of the goodness-of-fit.
Note that in certain cases, e.g. noise level set correctly and the same in all
bands, the different cases may be equivalent.
11
4.1
nb
X
(yi yi ) yi
i2
i=1
xj
(4.3)
Pnc
nb
X
(yi k=1 aik xk )
aij
i2
i=1
(4.4)
nb
X
1
2 yi aij
i=1 i
nc X
nb
X
1
2 aij aik xk
i=1 i
(4.5)
Bj
k=1
nc
X
Cjk xk
(4.6)
k=1
where
Cjk
nb
X
1
aij aik
2
i=1 i
(4.7)
Bj
nb
X
1
yi aij
2
i=1 i
(4.8)
C 1 B
(4.9)
(4.10)
(4.11)
W is the matrix of weightings (a diagonal matrix if all the bands are independent, with wii = 12 , wij = 0 (i 6= j)), and A is the matrix of end-member
i
reference spectra. The solution is then
x =
4.2
(AT W A)1 AT W y
(4.12)
4.2.1
Pnc1
j=1
xj
The simplest way to apply this constraint is to express one component, typically
the last component, in terms of all the other components. A disadvantage is
that the method loses the explicit symmetry between all the components.
A solution may be obtained by requiring the first derivative of the goodnessof-fit parameter to be zero. This gives a matrix equation which can be solved by
matrix inversion methods (e.g. Singular Value Decomposition (SVD) or GaussJordan [15] - SVD methods are generally preferred because they can be made
more robust and also make the sensitivity of the solution to noise more explicit).
nc1
nb
XX
k=1 i=1
1
(aij ainc )(aik ainc )xk
i2
nc1
X
Cjk xk
nb
X
1
(4.13)
)
2 (aij ainc )(yi ainc
i=1 i
Bj
(4.14)
k=1
where
Cjk
nb
X
1
2 (aij ainc )(aik ainc )
i=1 i
Bj
nb
X
1
2 (aij ainc )(yi ainc )
i=1 i
(4.15)
(4.16)
4.2.2
C 1 B
(4.17)
Lagrange multiplier methods can incorporate the contraint on the sum of the
components on the same basis as requiring that the signal in a particular band
is well-modelled. An extra band row is provided for the end-member data
to generate the sum of the component proportions. The corresponding data
value is 1, added to the signal input. The band measurement noise chosen acts
as the Lagrange multiplier and sets the scale of permitted deviation of the sum
of the components from 1.
Lagrange multiplier methods appear to be appropriate to be based on methods with no other constraints applied to the sum of components (or else double
constraints are being applied for the same feature of the problem).
4.3
The current method is to check the first solution for any components which are
negative. Any that are found are set to zero, and the problem is re-run using
13
4.4
Multiple constraints
The constraints on the sum of components and their permitted range can be
applied independently.
The constraint on components not exceeding 1 could be applied independently, although once one component is set to 1 all the others must be zero (if
the constraint on the sum of components is also active).
14
Chapter 5
5.1
Approximate Methods
An iterative method based on the case with no end-member noise is possible, and
is identical except for the evaluation of the expected noise standard deviation
in each signal band. Iterations repeat until the solution values for successive
iterations differ by less than some preset threshold. See section 2.5 of [15] for
more discussion of iterative methods for improving solutions to sets of equations.
The sections above describe how the different constraints may be applied.
5.2
Exact Methods
G =
nb
2
X
(yi yi )
i=1
i2
(5.1)
nc
X
aij xj
(5.2)
j=1
15
i2
nc X
nc
X
xk xl aik ail + 2
k=1 l=1
nc
X
xk ni aik + ni ni
(5.3)
k=1
yi
ainc +
nc1
X
xj (aij ainc )
(5.4)
j=1
i2
n2i + 2
nc1
X
k=1
nc1
X nc1
X
(5.5)
k=1 l=1
= aik ajl
(5.6)
nb
nb
2
X
(yi yi ) yi X (yi yi ) i2
i2
xj
i4
xj
i=1
i=1
(5.7)
To evaluate the effect of measurement errors the second derivative may also
be required:
nb
nb
2
X
2 i2
1 yi yi X (yi yi )
2 i2 i2
2 xk xj
i4
xk xj
i2 xj xk
i=1 i
i=1
2
nb
X
(yi yi )
1 yi i2
1 i2 yi
yi
2
2
2
(5.8)
i2
xk xj
i xj xk
i xj xk
i=1
2
nb
nb
2
X
G
1 yi yi X (yi yi )
2 i2 i2
2 i2
E
' 2
2
(5.9)
xk xj
2 xk xj
i4
xk xj
i xj xk
i=1 i
i=1
2G
xk xj
= 2
This expression is approximately correct in the sense that terms in odd powers of the difference between the measured and modelled signals have been
ignored. (If the errors are random, which they should be for a good model,
then this is a good approximation for the purposes of evaluating typical error
bounds.)
16
The even powers of the difference between the measured and modelled signals
are related to the expected signal variance.
nb
!
X (yi yi )2
i=1
i2
(5.10)
= nb + n0 nc
(5.11)
(yi yi )2
i2
nb + n0 nc
nb
(5.12)
The expression for the expected value of the second derivative of G is therefore
2G
xk xj
= 2
nb
X
1 yi yi
2 x x
k
j
i=1 i
nb
2(nb + n0 nc) X 1 1 2 i2
1 i2 i2
(5.13)
nb
2 2 xk xj
i2 xj xk
i=1 i
These expressions apply whether or not constraints are applied to the sum
of the components. The constraints affect the expressions used to evaluate the
derivatives of yi and i2 .
5.2.1
5.2.2
yi
xj
aij
i2
xj
2 i2
xk xj
nc
X
(5.14)
xk eij,ik
(5.15)
k=1
2eik,ij
(5.16)
aij ainc
i2
xj
2 [eij,inc einc,inc ] + 2
2 i2
xk xj
(5.17)
nc1
X
k=1
(5.19)
5.2.3
18
Chapter 6
x [] x
(6.1)
=
=
1 2 2
2 xj xk
1 2G
2 xj xk
(6.2)
The matrix [] thus needs to be evaluated at the solution point to estimate the expected error bounds. Different definitions apply according to the
constraints applied.
[] needs to be inverted to obtain the covariance matrix, which is then projected onto the sub-space of interest for the error bounds before being reinverted
to obtain the coefficients describing the error bound ellipsoid.
The chosen confidence level and number of degrees of freedom for the solution
determine the actual value used for 2 (the permitted increase in 2 relative
to its value at the minimum corresponding to the solution point) to evaluate
error bounds in a particular case.
19
20
Chapter 7
Developments of the
Standard Methods
It is possible to obtain greater accuracy or a wider range of parameters if the
signals from several pixels are used. The penalty for using these methods is that
there is some loss of spatial resolution.
7.1
7.2
If the proportions are known then it is possible to use m pixels to estimate the
spectral signature in n bands, averaged over the m pixels, as long as m > n.
It is assumed that the m pixels are independent in the sense that they provide
independent items of information, i.e. the signal for any one pixel is not a linear
function of the signals of a (sub-)set of the other m 1 pixels.
The result applies to the area covered by the pixels, i.e. there is a loss of
spatial resolution.
21
22
Chapter 8
Conclusions
This report describes the solution algorithms used by the program MIXMOD to
analyse mixed pixel data, assuming the linear mixing model. A unique feature
of the program is its ability to handle uncertainty in the assumed end-member
spectra. MIXMOD includes routines to perform the three principal analysis
steps: (1) estimate a solution, (2) evaluate the quality of the solution, and (3)
provide a measure of the error for the solution.
The literature reviewed gives an overview of current applications of linear
mixture modelling, as well as some pointers to the wider literature concerned
with linear systems. Several studies comparing mixture modelling with alternative methods of analysis are also identified. The principal strength of mixture
modelling relative to the alternatives is that because it is based on an explicit
model, its limitations can be clearly identified, and within those limitations
it is possible to quantify the goodness-of-fit and to evaluate appropriate error
bounds.
Although a large number of mixture modelling studies have been presented,
relatively few deal with issues of quantifying the solution quality or of estimating
proper error bounds. This report presents methods of performing these tasks,
taking account of all the principal sources of uncertainty. As the field of mixture
modelling matures, these issues must be addressed.
23
24
Bibliography
[1] Adams, J.B., Smith, M.O., and Gillespie, A.R. (1989), Simple models for
complex natural surfaces: a strategy for the hyperspectral era of remote
sensing. Proceedings of the IGARSS 89, Vancouver, Canada: IGARSS 89,
Vol. I, pp. 16-21.
[2] Atkinson, P.M., Cutler, M.E.J., and Lewis, H. (1995), Mapping sub-pixel
variation in land cover in the UK from AVHRR imagery. Proc. 21st Annual Conference of the Remote Sensing Society, 11-14 September 1995,
Southampton, UK. Published by the Remote Sensing Society, pp. 58-65,
ISBN 0 946226 20 0.
[3] Campbell, J.G. (1981) The Use of Landsat MSS Data for Ecological Mapping, Proc. 9th Ann. Conf. Remote Sensing Society, Univ. London, Dec.
1981. pp 143-161.
[4] Campbell, J.G., and A.A. Hashim (1992), Fuzzy Sets, Pattern Recognition,
and Neural Networks ..., Proc. 18th Ann. Conf. Remote Sensing Socy.,
Univ. Dundee, Sept. 1992 (AP. Cracknell and RA. Vaughan, eds.) pp. 508
-517.
[5] Cross, A.M. (1990). AVHRR as a data source for a GIS: deforestation in
Amazonia. Proceedings of the International Geoscience and Remote Sensing
Symposium, 1990. (IGARSS 90), Washington, D.C. (New York: IEEE),
pp.223-226.
[6] Foody, G.M., and Cox, D.P. (1994), Sub-pixel land cover composition estimation using a linear mixture model and fuzzy membership functions. Int.
J. Remote Sensing, 15(3), 619-631.
[7] Hobbs, S.E., and Thomas, G. (1996), A goodness-of-fit method applied to
spectral mixture modelling. Proc. Image Processing: Mathematical methods and applications. Conference of The Institute of Mathematics and its
Applications at Cranfield University, September 1994. Published by Oxford
University Press (in press).
[8] Holben, B.N., and Shimabukuro, Y.E. (1993), Linear mixing model applied
to coarse spatial resolution data from multispectral satellite sensors. Int.
J. Remote Sensing, 14(11), 2231-2240.
[9] Houghton, J.T., Taylor, F.W., and Rodgers, C.D. (1984), Remote sounding
of atmospheres, Cambridge University Press, Cambridge (ISBN 0 521 31065
2).
25
26
[25] Shimabukuro, Y.E., and Smith, J.A. (1991), The least-squares mixing models to generate fraction images derived from remote sensing multispectral
data. IEEE Transactions on Geoscience and Remote Sensing, 29(1), 16-20,
January 1991.
[26] Smith, M.O., Johnson, P.E. and Adams, J.B. (1985). Quantitative determination of mineral types and abundances from reflectance spectra using
principal component analysis. Journal of Geophysical Research, 90:792-804.
[27] Smith, M.O., Adams, J.B., and Gillespie, A.R. (1990), Reference endmembers for spectral mixture analysis. Proc. 5th Australian Remote Sensing
Conference, 8-12 October 1990, Perth, Australia, Vol. I, 331-340.
[28] Smith, M.O., Adams, J.B., and Sabol, D.E. (1994), Spectral mixture analysis - new strategies for the analysis of multispectral data. Imaging Spectrometry - a Tool for Environmental Observations, Ed. J. Hill and J. Megier,
Kluwer Academic Publishers, Dordrecht, pp 125-143 (ISBN 0-7923-2965-1).
[29] Thomas, G, Hobbs, S E and Dufour, M J D (1996). Woodland area estimation by spectral mixing: applying a goodness-of-fit solution method.
International Journal of Remote Sensing, 17, pp 291-301.
[30] van Kootwijk, E.J., van der Voet, H., and Berdowski, J.J.M., (1995), Estimation of ground cover composition per pixel after matching image and
ground data with subpixel accuracy. International Journal of Remote Sensing, 16(1), 97-111.
[31] Zhukov, B., Berger, M., Lanzl, F., and Kaufmann, H., (1995), A new technique for merging multispectral and panchromatic images revealing subpixel spectral variation. Proceedings International Geoscience and Remote
Sensing Symposium 1995, IGARSS 95, Firenze, Italy, 10-14 July 1995,
Vol. III, IEEE.
27