Param Estim Tikhonov

ARTICLE IN PRESS
Computational Statistics and Data Analysis ( ) –
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda
Regularization parameter estimation for large-scale Tikhonov

regularization using a priori information
Rosemary A. Renaut ∗ , Iveta Hnětynková, Jodi Mead
Arizona State University, Department of Mathematics and Statistics, Tempe, AZ 85287-1804, United States
article info abstract

Article history: Solutions of numerically ill-posed least squares problems Ax ≈ b for A ∈ Rm×n by
Received 20 November 2008 Tikhonov regularization are considered. For D ∈ Rp×n , the Tikhonov regularized least
Received in revised form 28 May 2009 squares functional is given by J (σ ) = kAx − bk2W + 1/σ 2 kD(x − x0 )k22 where matrix W is
Accepted 29 May 2009
a weighting matrix and x0 is given. Given a priori estimates on the covariance structure
Available online xxxx
of errors in the measurement data b, the weighting matrix may be taken as W = Wb
which is the inverse covariance matrix of the mean 0 normally distributed measurement
errors e in b. If in addition x0 is an estimate of the mean value of x, and σ is a suitable
statistically-chosen value, J evaluated at its minimizer x(σ ) approximately follows a χ 2
distribution with m̃ = m + p − n degrees of freedom. Using the generalized singular value
1/2
decomposition of the matrix pair [Wb AD], σ can then be found such that the resulting
J follows this χ distribution. But the use of an algorithm which explicitly relies on the
2
direct solution of the problem obtained using the generalized singular value decomposition
is not practical for large-scale problems. Instead an approach using the Golub–Kahan
iterative bidiagonalization of the regularized problem is presented. The original algorithm
is extended for cases in which x0 is not available, but instead a set of measurement data
provides an estimate of the mean value of b. The sensitivity of the Newton algorithm
to the number of steps used in the Golub–Kahan iterative bidiagonalization, and the
relation between the size of the projected subproblem and σ are discussed. Experiments
presented contrast the efficiency and robustness with other standard methods for finding
the regularization parameter for a set of test problems and for the restoration of a relatively
large real seismic signal. An application for image deblurring also validates the approach for
large-scale problems. It is concluded that the presented approach is robust for both small
and large-scale discretely ill-posed least squares problems.
© 2009 Elsevier B.V. All rights reserved.
1. Introduction
We are concerned with the solution of large-scale linear discrete ill-posed problems such as arise in many physical
experiments associated, for example, with the discretization of integral equations (Vogel, 2002; Hansen, 1998; Hansen
et al., 2006a). Here we will illustrate the presented approach for the deblurring of a noisy image; a problem which can
be addressed in several different ways and from different perspectives, for example with regard to edge enhancement, as
well as deblurring when the underlying blur operator is unknown, i.e. (Benito and Pea, 2007; Obereder et al., 2007; Qiu,
2008). For our situation we suppose a model given by the ill-posed system of equations Ax = b, where matrix A ∈ Rm×n
results from the underlying model discretization and a solution x ∈ Rn is desired for measurements b ∈ Rm , which are
∗ Corresponding author. Tel.: +1 480 965 3795; fax: +1 480 965 4160.
E-mail addresses: [email protected] (R.A. Renaut), [email protected] (I. Hnětynková), [email protected] (J. Mead).
0167-9473/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.csda.2009.05.026
Please cite this article in press as: Renaut, R.A., et al., Regularization parameter estimation for large-scale Tikhonov regularization using a priori
information. Computational Statistics and Data Analysis (2009), doi:10.1016/j.csda.2009.05.026
ARTICLE IN PRESS
2 R.A. Renaut et al. / Computational Statistics and Data Analysis ( ) –
often noise-contaminated. An approximate solution x̂ may be obtained by solving the weighted regularized least squares
problem, with matrix D ∈ Rp×n , p ≤ n, chosen dependent on anticipated smoothness properties of the solution x,
x̂ = argmin J (x) = argmin{kAx − bk2Wb + kD(x − x0 )k2Wx }, (1)
where we use the standard definition of the weighted norm for vector y with weight matrix W given by yT W y = kyk2W .
Here we consider the general case in which the fit to data term is measured in weighted norm with weight matrix Wb which
may be available experimentally. For example, when the measurement errors e in the measurement data b are assumed to
be the samples from a multivariate normal distribution, Nm (0, Cb ) (m variables, mean 0 and covariance Cb ), the weighting
is the inverse covariance Wb = Cb−1 . For the case of colored noise Cb is diagonal, Cb = diag(σ12 , σ22 , . . . , σm2 ), whereas for
white noise Cb = σ 2 Im .
As presented here the regularization term D(x − x0 ) is also calculated in a weighted norm. If Wx can be assumed to be
the inverse covariance matrix for normally distributed errors in the mapped model parameters Dx, then it can be shown
that the minimum of the functional J follows a χ 2 distribution with m − n + p degrees of freedom, Mead (2008); Mead and
Renaut (2009). This extends the standard result on the χ 2 distribution of the unregularized least squares functional (Rao,
1973). Also, if D = In and Wx = 1/σx2 In , where σx2 is the common white noise variance in model parameters x, then x̂ is
the standard maximum a posteriori (MAP) estimate of the solution, Vogel (2002). The advantage of assuming that weighting
matrices Wb and Wx are inverse covariance matrices is that it permits our focus on the use of the χ 2 property of the minimum
functional to efficiently determine the parameter σx when Wx = 1/σx2 In .
The determination of the optimal λ = 1/σx is a topic of much previous research, and includes methods, amongst others,
such as the L-curve (LC), generalized cross validation (GCV), the unbiased predictive risk estimate (UPRE) and the discrepancy
principle estimate of the distance between the exact and regularized solution (O’Leary, 2001), all of which are well described
in the literature, see e.g. Hansen (1998) and Vogel (2002) for comparisons of the criteria and more references. Other recent
approaches analyse the statistical properties of the residual of the least squares functional (Rust and O’Leary, 2008; Hansen
et al., 2006b). Some especially promising efforts for determining λ, particularly for large-scale problems, have placed
emphasis on regularization methods based on iterative Golub–Kahan bidiagonalization, e.g., LSQR (Paige and Saunders,
1982a,b) and hybrid methods (Hanke and Hansen, 1993; Fierro et al., 1997; Hansen, 1998; Kilmer and O’Leary, 2001; Kilmer
et al., 2007; Chung et al., 2008; O’Leary and Simmons, 1989). In standard hybrid methods the original problem is projected
onto a lower-dimensional subspace using the bidiagonalization algorithm, which by itself represents a form of regularization
by projection. The projected bidiagonal problem, however, inherits a part of the ill-posedness, and therefore some form
of inner regularization is applied to the projected small subproblem. The bidiagonalization may be stopped when the
regularized solution of the projected problem matches any of the previously mentioned stopping criteria. Hybrid methods
seek to combine the bidiagonalization procedure with determination of an appropriate regularization parameter for solving
the projected system, i.e. project then regularize. An alternative form of hybrid is described here, regularize while projecting.
Specifically, the hybrid method presented in this paper still utilizes the standard efficient iterative bidiagonalization of the
LSQR, but in contrast to the standard hybrid, this is combined with a parameter search algorithm derived from the χ 2
property of the regularized functional. Throughout, having noted this distinction between the standard hybrid and our new
hybrid LSQR, we still refer to the present method as a hybrid LSQR algorithm. The details are discussed in Section 3.2.
Incorporating the χ 2 property of the functional J to find λ is a recent innovation. Provided that the weighting matrix
Wx is the appropriate inverse covariance matrix for the regularization term and that the prior information x0 approximates
the mean of parameters x, the χ 2 property implies that J lies within an interval centered around its expected value, Mead
and Renaut (2009). Therefore a Newton root-finding algorithm can be designed for determining σx . This algorithm may
be implemented for small-scale problems using a direct solve employing the generalized singular value decomposition
1/2
(GSVD) of the matrix pair [Wb A|D], Mead and Renaut (2009). Here the GSVD direct solve is replaced by the iterative
bidiagonalization for the regularized problem. This algorithm has the benefit of reuse of the bidiagonal system for each step
of the Newton iteration, namely for each value of σ = 1/λ found in the Newton algorithm, and hence solves the regularized
problem for optimal λ at almost no overhead as compared to a single solve for one given λ. Note that a general operator D,
D 6= I, can also be considered, using the conversion to standard form regularization (Hansen, 1998; Eldén, 1982). Another
possibility would be to use an algorithm for simultaneous bidiagonalization of A and D as presented in Kilmer et al. (2007).
We now give a brief outline of the remainder of the paper. In Section 2 we present extensions of the theory, relevant
for solving more general problems. For example, when x0 is not the expected value of the model data we show that the
minimum functional follows a non-central χ 2 distribution, with centrality parameter related to the choice of x0 in relation
to the actual mean x. Moreover, the use of a truncated (filtered) expansion of the GSVD for providing a filtered direct GSVD
solution also modifies the theory, yielding a functional which has fewer degrees of freedom, dependent on the number of
terms used in the filtered expansion. The original GSVD based implementation of the Newton algorithm can be extended
for both cases and implementation details are provided in Section 3. For large-scale problems the same Newton algorithm is
employed but solutions are obtained iteratively using the hybrid algorithm described in Section 3.2. Numerical experiments
detailed in Section 4 contrast the performance of the iterative and direct solve algorithms for small-scale problems, for
both central and non-central distributions of the underlying functional. These experiments validate both the algorithm for
non-central functionals and for the large-scale implementation. The dependence of the regularization parameter obtained
in relation to the size of the projected problem is also discussed, Section 4.3. As the subproblem size increases, the solution
ARTICLE IN PRESS
R.A. Renaut et al. / Computational Statistics and Data Analysis ( ) – 3
admits higher frequency components and more regularization is needed. Finally, in Sections 4.4 and 4.5 the hybrid algorithm
is shown to yield efficient and robust results for the deblurring of a relatively large real seismic signal, and the deblurring of
a large-scale image for which the true image is available. Future work and conclusions are discussed in Section 5.
2. Theoretical development
2.1. χ 2 distribution of the regularized functional
The solution of (1) with Wx = λ2 In , assuming invertibility

N (A) ∩ N (D) = ∅, (2)
is given by
x̂(λ) = (AT Wb A + λ2 DT D)−1 AT Wb r + x0 = x(λ) + x0 , (3)

where the residual r is given by r = b − Ax0 . This is more compactly written using the resolution matrix R(λ) =
1/2 1/2
(AT Wb A + λ2 DT D)−1 AT Wb , or, more generally, R(WD ) = (AT Wb A + WD )−1 AT Wb , where WD = DT Wx D, yielding
1/2 1/2
x̂(WD ) = R(WD )Wb r + x0 . With this notation, and introducing the influence matrix A(WD ) = Wb AR(WD ), (1) is written
as a quadratic form,
1/2 1/2
J (x̂(WD )) = rT Wb (Im − A(WD ))Wb r. (4)
To obtain the result on the χ distribution of this functional we employ the GSVD, using the notation as given in Mead and
2
Renaut (2009), to reexpress this quadratic form.
Lemma 1 (Golub and van Loan (1996)). Assume the invertibility condition (2) and m ≥ n ≥ p. There exist unitary matrices
U ∈ Rm×m , V ∈ Rp×p , and a nonsingular matrix X ∈ Rn×n such that
A = U Υ̃ X T , D = V M̃X T , (5)
where
Υ 0
" #
Υ̃ = 0 In − p , Υ = diag(υ1 , . . . , υp ) ∈ Rp×p ,
0 0
M̃ = M , 0p×(n−p) , M = diag(µ1 , . . . , µp ) ∈ Rp×p ,

and such that

0 ≤ υ1 ≤ · · · ≤ υp ≤ 1, 1 ≥ µ1 ≥ · · · ≥ µp > 0 ,
(6)
υi2 + µ2i = 1, i = 1, . . . , p.
1/2 1/2
Now, using the GSVD of the matrix pair [Wb A|WD ],
1/2 1/2
J (x̂(WD )) = rT Wb U (Im − Υ̃ Υ̃ T )U T Wb r, (7)
p m
1/2
X X
= s2i µ2i + s2i , s = U T W b r, (8)
i =1 i=n+1
n
1/2
X
= kkk2 − k2i , k = QU T Wb r, (9)
i=p+1
Q = diag(µ1 , . . . , µp , In−p , Im−n ). (10)
This is the starting point for showing that J follows a central χ distribution with m + n − p degrees of freedom as detailed
2
in Theorem 3.1 Mead and Renaut (2009), provided that x0 is the expected value, denoted by x, of x. Typically, x is unknown
and x0 = 0 is chosen. For x0 6= x the following non-central generalization is obtained.
Theorem 1 (Non-Central χ 2 Distribution of the Regularized Functional). Suppose

• Cb = Wb−1 is the symmetric positive definite (SPD) covariance matrix on the mean zero normally distributed data error, ei ,
• CD is the, possibly rank deficient, symmetric positive (semi-)definite covariance matrix for the mean zero normally distributed
model errors ζi = (D(x̂(WD )i − x0 ))i , with the conditional SPD inverse WD which satisfies the two Moore–Penrose conditions
WD CD WD = WD and CD WD CD = CD ,
ARTICLE IN PRESS
• that the invertibility condition (2) holds, and

• that the expected value of x is x.
Then in the limit m − n + p sufficiently large, the minimum value of the functional J is a random variable which follows a non-
1/2
central χ 2 distribution with m − n + p degrees of freedom, and non-centrality parameter c = kc̃k22 = kQ̃ U T Wb A(x − x0 )k22 ,
where
Q̃ = diag(µ1 , . . . , µp , 0n−p , Im−n ). (11)
Equivalently, J ∼ χ (m − n + p, c ) has expected value m − n + p + c and variance 2(m − n + p) + 4c.

2
Proof. We use the Fisher–Cochran theorem for quadratic forms (Rao, 1973), to show that in the limit as the number of
terms increases (7) follows a χ 2 distribution. Because (9) expresses the quadratic form in terms of the two norms of the
vector k, excepting the components p + 1 : n, it is sufficient to consider the distributions of the relevant components ki of
k. The argument follows as in Mead and Renaut (2009) but the statistical argument is modified when x0 is not the mean.
In particular, because the data and model errors are mean zero and normally distributed, the expected value of the random
1/2
variable b is b = Ax, so that r = A(x − x0 ), and k has mean c = QU T Wb A(x − x0 ). Thus b ∼ Nm (Ax1 , Cb + ACD AT )
1/2 1/2 1/2 1/2
and Wb r ∼ Nm (Wb A(x − x0 ), Im + Wb ACD AT Wb ). But, using the GSVD, CD = (X T )−1 diag(M −2 , 0n−p )X −1 , and with
1/2
Ã = Wb A, Im + ÃCD ÃT = UQ −2 U T . Therefore,
1/2
k ∼ Nm (QU T Wb A(x − x0 ), Im ). (12)
Now, by (9), the centrality parameter is calculated excluding the component means of vector k for components p + 1 : n
and the result follows, Rao (1973).
1/2
Remark 1. Theoretically, for b ∈ range(A), r ∈ range(A) and the last n + 1 : m components of s = U T Wb r are identically
Pm This2 would imply that J has p degrees of freedom instead of m − n + p. Practically, b is error-contaminated and
zero.
i=n+1 si in (8) is positive and constant with respect to σ . On the other hand, given precise values for x and x0 , the last
1/2
m − n components of c = Q q = QU T Wb A(x − x0 ) = Q Υ̃ X T (x − x0 ) are identically zero.
Theorem 1 suggests that an approach for finding a suitable regularization parameter in the single variable case Wx = λ2 In ,
given sufficient statistical information on the measured data b, is to find a λ = 1/σx so that as closely as possible J
follows a χ 2pdistribution, J (σx ) ∼ χ 2 (m − n + p, c (σx )). Equivalently, introducing the notation m̃ = m − n + p, and
δ(σx ) = zα/2 2m̃ + 4c (σx ), we want to determine σx such that
m̃ + c (σx ) − δ(σx ) ≤ J (σx ) ≤ m̃ + c (σx ) + δ(σx ), (13)
where zα/2 is the relevant z-value for a standard normal distribution, and α defines the (1 − α) confidence interval that
J ∼ χ 2 (m − n + p, c ). Taking c = 0 this is equivalent to the condition used in Mead and Renaut (2009) for the case when
x0 ≈ x. Here, because c depends on σ , the design of an algorithm to find σx satisfying (13) becomes more challenging.
The algorithm design is discussed in Section 3, but first we address a modification of the result when the numerical rank is
reduced.
2.2. The truncated GSVD
We are also interested in the case when the numerical rank of the resolution matrix is reduced, in which case the number
of degrees of freedom of the random variable J is also reduced. We illustrate this observation through the use of a truncated
GSVD expansion for the solution (3).
Suppose that the regularized solution x̂(λ) is written in terms of the GSVD expansion.
p
X υi Xn
x(λ) = s i x̃i + si x̃i ,
i=1
υi2 + λ2 µ2i i=p+1
p
X γi2 Xn
= s i x̃i + si x̃i ,
i=1
υi (γi2 + λ2 ) i=p+1
p
X fi Xn
γi2
= si x̃i + si x̃i , fi = , (14)
i=1
υi i=p+1
γi2 + λ2
using the notation fi given in Hansen (1989) and where x̃i , i = 1 . . . n are the columns of matrix (X T )−1 . It is well known
that the stable numerical calculation of the GSVD relies on kDk ≈ kÃk, Hansen (1998, 1989). If Cb is not well conditioned,
this property carries through to Ã, ill-conditioning will be reflected in υi and x̃i for small i, and the full expansion in (14) will
ARTICLE IN PRESS
lead to a solution that is noise-contaminated. Thus, as with the use of the truncated singular value decomposition for ill-
conditioned problems, a truncated GSVD expansion, Hansen (1989), of the solution has previously been suggested. Setting
fi = 0 for small components υi , i ≤ p − r and 1 otherwise, yields a truncated expansion solution
p n
X 1 X
xTGSVD = si x̃i + si x̃i . (15)
i=p+1−r
υi i=p+1
More generally, we may consider the filtered solution, following the suggestion in Marquardt (1970) for the regularized
TSVD, in which the filter function is defined by fi = 0 for υi < τ for some tolerance τ , but for the other terms we maintain
the dependence on the regularization parameter λ
p p
X γi2 Xn X fi Xn
xFILT (λ) = s i x̃i + s i x̃ i = s i x̃i + si x̃i . (16)
i=p+1−r
υi (γi2 + λ2 ) i=p+1 i=1
υi i=p+1
This amounts to setting υi = 0, µi = 1, so that γi = 0, i ≤ p − r. Consequently, we obtain a new expression for the quadratic
form (8)
p−r p
X m
X X λ2 2
J (xFILT (λ)) = s2i + s2i + fi s . (17)
i=1 i=n+1
γ2 i
i=p−r +1 i
Theorem 1 is modified appropriately and the distribution applies not for the original J (λ) but for a new functional J̃ (λ) with
the constant terms in (17) removed
p−r
X
J̃ (λ) = J (λ) − s2i . (18)
i=1
Theorem 2. Under the same conditions as Theorem 1, but with the solution given by (16), such that fi = 0, i = 1 : p − r,
fi = γi2 /(γi2 + λ2 ), i = p − r + 1 : p, then the function J̃ is a χ 2 random variable with m − n + r degrees of freedom, and
centrality parameter
1/2
c = kc̃k22 = kQ̃ U T Wb A(x − x0 )k22 , (19)
where, generalizing (11),
Q̃ = diag(0p−r , µp−r +1 , . . . , µp , 0n−p , Im−n ). (20)
1/2
Proof. The proof proceeds as before for Theorem 1 but replacing k in (9) with k̃ = Q̃ U T Wb r with the new definition (20)
1/2
for Q̃ . The distribution for Wb r is unchanged, but
1/2
k̃ ∼ Nm (Q̃ U T Wb A(x − x0 ), diag(0p−r , Im−(p−r ) )).
Whereas the mean and variance of the components of k̃i are unchanged from those for ki for i > p − r, the first p − r
components of k̃ are constant, mean and variance 0. Therefore
p−r
X
J̃ = J − s2i (21)
i=1
p
X m
X
= k̃2i + k̃2i , (22)
i=p−r +1 i=n+1
is a sum of m − n + r normally distributed random variables each with variance one and respective mean ci .
Inequality (13) applies as before but with r replacing p in the definition of m̃ and c calculated over the relevant
components p − r + 1 : p. Notice that the distribution of the functional arises only from the components in the expansion
for the solution xFILT which are filtered, the other components are constants independent of the regularization parameter λ.
Equivalently the unregularized components do not contribute to the statistical properties of the functional. It appears that
the degrees of freedom of the functional is determined by overall numerical rank of the subspace that defines the solution.
This observation is a topic for future research.
ARTICLE IN PRESS
3. Implementation
3.1. The Newton algorithm
In Mead and Renaut (2009) a Newton algorithm to find σ using the original formulation of the theory, without the
centrality parameter, was presented. There it was based on the use of the GSVD to find the root of F (σ ) = J (σ ) − m̃ = 0.
The basic Newton iteration, with line search parameter α (k) , may be written generally as
2 !
σ (k)

(k) 1
σ (k+1) = σ (k) 1 + α (J (σ (k) ) − m̃) . (23)
2 kDx(σ (k) )k
The derivative is given by
2
J 0 (σ ) = − kDx(σ )k2 . (24)
σ3
This can be determined by implicit differentiation of J (σ ) in (4), but also follows more easily from the expression for J in
terms of the GSVD for the pair [Ã|D],
p
X s2i
m
X υi 1
J (σ ) = kk(σ )k2 = + s2i , γi = ,σ = , (25)
i=1
(σ γ + 1)
2
i
2
i=n+1
µi λ
combined with using the expansion (14) for the solution x(λ). Observe, as mentioned in Remark 1, that the term i=n+1 s2i
Pm
Pm and so practically in the GSVD implementation this term is calculated only once. In this case we adjust m̃ =
is constant
m̃ − i=n+1 s2i and only calculate the update of J for the other relevant components.
While, the algorithm for the truncated functional given by (18) is designed similarly with function F defined by
p m
X s2i X
F (σ ) = + s2i − (m − n + r ) ≈ 0. (26)
i=p−r +1
(σ 2 γi2 + 1) i=n+1
it is of greater interest to consider an approach to make the calculation efficient for large-scale problems. In the following
we introduce a method to obtain x(σ ) and hence J (σ ) iteratively.
3.2. The hybrid LSQR algorithm
The algorithm is based on Golub–Kahan iterative bidiagonalization (Paige and Saunders, 1982a,b). Here, we only describe
the essential components of the algorithm, and refer to references for more details. Moreover, for the ease of presentation,
we give the algorithm for the case with Wb = I and x0 = 0, but note that the implementation with general Wb and x0
1/2
proceeds similarly, replacing A and b throughout by Ã and Wb (b − Ãx0 ), respectively, and finds x(λ) as given in (3).
Given initial vectors g0 ≡ 0, h1 ≡ b/β1 , where β1 ≡ kbk 6= 0, the Golub–Kahan iterative bidiagonalization computes,
using two-term recurrences requiring only matrix–vector multiplications with the matrices A and AT , orthonormal vectors
gi , hi , i = 1, 2, . . .
αi gi = AT hi − βi gi−1 , kgi k = 1, (27)

βi+1 hi+1 = Agi − αi hi , khi+1 k = 1. (28)
Let Hj ≡ [ h1 , . . . , hj ] ∈ R m×j
, Gj ≡ [g1 , . . . , gj ] ∈ R n×j
and
α1
 
β2 α2
Lj
, , Lj+ ∈ R(j+1)×j .

Lj ≡  .. .. Lj + ≡
βj+1 eTj
 . . 
βj αj
Then the recurrences (27)– (28) can be written in the matrix notation
AT Hj = Gj LTj , A Gj = [ Hj , hj+1 ] Lj+ . (29)
With the observation [Hj , hj+1 ]T b = β e1 , the j steps of the bidiagonalization yield a subproblem
Lj+ yj ≈ e1 β1 . (30)
ARTICLE IN PRESS
– 22
6.8551e–05– 22.6994
–23
–24
–25
FC (σ)
–26
–27
–28
–29
–30
–6 –5 –4 –3
10 10 10 10
log10 (σ)
Fig. 1. Example of the non-monotonicity of function FC in (33). FC (σ ) is plotted against log10 (σ ). The circles denote the values of the pairs
(log10 (σ (k) ), FC (σ (k) )) during the iteration to find the optimal choice of σ , found at σ = 6.9e − 05 with value FC (σ ) = −22.7.
Then, when D = I, the least squares solution yj (σ ) of the augmented system

Lj +
yj (σ ) ≈ e1 β1 , (31)
1/σ Ij
transformed to the original variables through
xj (σ ) ≡ Gj yj (σ ). (32)
represents an LSQR approximation to the solution of the original problem (1) for one particular σ , see Paige and Saunders
(1982a,b). Note that kxj (σ )k22 = kGj yj (σ )k2 = kyj (σ )k22 . For the general case when D 6= I, problem (1) can be transformed
to the so-called standard form, and then the LSQR algorithm in the basic form can be applied, details of this transformation
are provided for example in Hansen (1998).
In order to use the root-finding algorithm described in Section 3.1 to determine optimal σ , it is important to note that
the updated values for x(σ (k) ) and J (σ (k) ) can be obtained efficiently for each iteration. The solution x(σ (k) ) for each σ (k)
is computed by solving (31) with the appropriate number of bidiagonalization steps j = j(σ (k) ), and takes advantage of
the fact that the matrices Lj+ , Gj and the right-hand side e1 β1 do not depend on σ (k) . If j(σ (k) ) is smaller than j(σ (i) ) for
some 1 ≤ i < k, i.e. if the convergence criteria are satisfied for some i < k, the corresponding subblock of Lj+ can be used.
Otherwise the matrices Lj+ and Gj are augmented by computing additional steps in (27)–(28). Note that construction of the
matrix Lj+ is the most expensive part of the LSQR algorithm. Thus reusing Lj+ makes the computation of x(σ (k) ) efficient. The
calculation of J (σ (k) ) is clear from the update x(σ (k) ): it is approximated by the augmented residual for the approximate
solution xj (σ (k) ). Indeed, although Gj is immediately generated by the LSQR iteration, there is no need to explicitly form
x(σ (k) ) using the transformation (32) until the converged value of σ (k) . Of course, this does mean that the matrix Gj will need
to be accumulated and stored for use at the final stage of the algorithm, however, this is also the case with any algorithm
based on regularizing the LSQR iteration.
3.3. The algorithm with centrality parameter
To account for the centrality parameter c the algorithm needs modification. For the ease of explanation we explicitly
write functional J as a function of both b and σ , J = J (b, σ ). We have shown that J (b, σ ) = kk(b, σ )k2 , and that when σ
is chosen appropriately, J (b, σ ) will follow a χ 2 distribution with expected value that depends on its expected mean value.
But by Theorem 1 this mean value is explicitly given by c = kck2 = kk(Ax, σ )k2 . We thus need to solve
FC (σ ) := J (b, σ ) − J (Ax, σ ) − m̃ ≈ 0. (33)
Apparently we need to know x. But if x were known, we would actually solve using x0 = x in (1) and then J (Ax, σ ) = 0. On
the other hand, suppose that indeed x is not known, but we can find an estimate for b from the set of repeated measurements
of the experiment which are used to provide the covariance matrix Cb . Then we can replace Ax in (33) by b and seek to solve
FC (σ ) ≈ 0. However, from (24)
∂ FC 2
= − 3 (kDx(b, σ )k2 − kDx(b, σ )k2 ), (34)
∂σ σ
and the functional FC need not be monotonic. Indeed, the root of FC (σ ) = 0 need not exist. An example illustrating this is
shown in Fig. 1.
The potential for the nondefinite behavior of FC0 , when q 6= 0, illustrated in Fig. 1 is most apparent if we look at the GSVD
formulation. In particular, the equivalent equations for the GSVD implementation are, for (33) and (34), resp.
p m
X s2i − q2i X 1/2
FC (σ ) = + (s2i − q2i ) − m̃ = 0, q = U T Wb A(x − x0 ), and (35)
i =1
(σ 2 γi2 + 1) i=n+1
p
∂ FC X zi γi2
= −2σ , where zi = s2i − q2i . (36)
∂σ i=1
(σ 2 γi2 + 1)2
ARTICLE IN PRESS
3.3.1. Newton with bisection to minimize FC

We seek σ such that FC (σ ) is close to zero, equivalently, such that FC2 (σ ) is minimum. If there is a root such that FC (σ ) = 0,
it will solve the minimization, otherwise we find σ such that FC0 (σ ) = 0, and FC2 (σ ) is minimum. As for the original Newton
algorithm to solve F (σ ) = 0, the algorithm is made more robust by some basic observations. Both J (b, σ ) and J (b, σ ) which
occur in FC (σ ) are positive. Indeed,
FC (σ ) = kk(b, σ )k2 − kk(b, σ )k2 − m̃,
and both norms are monotonically decreasing with σ . Therefore,
kk(b, σ )k2 − kk(b, 0)k2 − m̃ ≤ FC (σ ) ≤ kk(b, σ )k2 − m̃.
Because kk(b, σ )k2 is itself monotonically decreasing with σ , for any σ > σmax , where kk(b, σmax )k2 = m̃, necessarily
FC (σ ) < 0, and FC0 (σmax ) < 0. Likewise, kk(b, σmin )k2 = m̃ + kk(b, 0)k2 , provides a lower bound for σ . The algorithm to
minimize FC2 (σ ) therefore first solves for both σmin and σmax , so as to find a bracket on the optimal σ . Because these two values
are found using only the functional J (b, σ ), the original fast convergent Newton algorithm can be used, Mead and Renaut
(2009). Indeed, determination of σmax solves the original central distribution problem under the assumption that the given
x0 = x, and can itself be used to give an estimate of the solution, x(b, σmax ). In the rare case that the vector z, defined from
its components zi , is itself definite, these bounds on σ effectively bracket the root of a monotonically increasing function,
and the original Newton algorithm can now be applied to function FC to find its root. Otherwise we use simple bisection to
find the minimum of FC0 (σ ) within the determined bracket.
Remark 2. It might appear that another approach to solve in the case that x0 6= x but b is available, would be to use b
to find an estimate of x, and then to solve again using the obtained value for the estimate of the expected value x as x0
in J (b, σ ). However, in this case we would need to find the estimate for x without regularization. We know that for ill-
conditioned problems, even without noise, the estimate of the solution x is unlikely to be useful. While such an approach
would therefore avoid the problem of finding the minimum for FC , its success would be limited to problems which are well
conditioned. Experiments, not reported here, confirm this observation.
4. Numerical experiments
The major goal of the presented numerical experiments is that they validate the hypothesis that the hybrid LSQR
algorithm can be used to efficiently obtain the regularization parameter using the χ 2 properties of the regularized functional
for large-scale problems. Several experiments are presented. First, in Section 4.1, we contrast the basic Newton algorithm
implemented using direct GSVD solves and iterative hybrid LSQR solutions for a set of small-scale test problems. The
algorithm is implemented as described in Section 3.1 and is based on the original work in Mead and Renaut (2009). In all
cases it is implemented in exactly the same way for both direct GSVD and iterative hybrid LSQR versions, namely line search
and bracketing are performed equivalently. Because the hybrid LSQR is iterative, its performance depends on the number
of steps of the bidiagonalization algorithm (27)–(28). As proposed in the original papers (Paige and Saunders, 1982a,b), two
stopping criteria for the generation of the bidiagonal system are used
kA T r j k
krj k ≤ btol kbk + atol kAk kxj k and ≤ atol. (37)
kAk krj k
The quantities krj k, kxj k, kAk, and kAT rj k can be estimated at minimal cost in the LSQR iterations. The quantities atol, btol
are user specified and should reflect the expected accuracy of the data, see Paige and Saunders (1982a) for more details.
Although extensive comparisons between the new regularization method using prior information x0 and other standard
regularization parameter estimation methods, described for example in Hansen (1998); Vogel (2002), were provided
in Mead and Renaut (2009), in Section 4.2 we contrast the performance of the algorithm using prior information given
through b with the LC and UPRE methods. Having validated the hybrid algorithm, we then turn in Section 4.3 to an
examination of its performance with respect to the size of the subproblem at each Newton iteration and the tolerances
atol and btol. Again the results are also compared with those obtained using LC and UPRE regularization strategies. Finally,
we demonstrate that the new approach can be used for the solution of large-scale problems. Two examples are presented,
one the deconvolution of real one-dimensional seismic signals, Section 4.4 for which the solution is also contrasted with
that obtained using the UPRE, and the second a standard large-scale image deblurring test problem, Section 4.5.
4.1. Comparing hybrid LSQR and GSVD
The purpose of the experiments presented in this section is the assessment of the consistency of the σ update strategy
when implemented for the hybrid LSQR algorithm as compared to solutions obtained using the GSVD. In our early
experiments, not reported here, we assumed that it would be sufficient to find the initial subproblem size given by j(σ (0) )
and use that value for the subsequent outer iterations on σ , or to allow j to increase as we changed σ . But recall, as noted in
the Introduction, that the basic LSQR algorithm introduces its own regularization on the least squares problem through the
ARTICLE IN PRESS
Table 1
The last two columns are the p-values for paired t-tests between GSVD and LSQR results, for both the obtained σ and the number of σ updates. The first
column is the noise level used in generating the noisy data. These results are for problem size 64, over 500 tests of each problem, and use x0 = 0 in the
definition of F (σ ).
shaw
Noise Average Iterations K p-value
Level j(σ (k) ) GSVD LSQR K σ
1e−03 13.1 7.9 7.9 1.0 1.0
5e−03 11.5 7.7 7.6 0.6 0.3
1e−02 10.3 7.0 7.0 1.0 1.0
5e−02 7.2 7.1 7.1 1.0 1.0
1e−01 6.2 7.0 7.0 0.9 0.8
ilaplace
1e−03 16.0 7.3 7.3 1.0 1.0
5e−03 12.2 6.3 6.3 1.0 1.0
1e−02 10.7 6.4 6.4 0.9 1.0
5e−02 8.0 6.5 6.5 0.9 1.0
1e−01 7.4 6.5 6.5 0.9 0.5
heat
1e−03 63.8 5.9 5.8 0.9 0.0
5e−03 63.9 5.5 5.5 1.0 1.0
1e−02 47.6 5.7 5.7 1.0 1.0
5e−02 29.4 5.8 5.8 1.0 1.0
1e−01 22.4 6.0 6.0 1.0 1.0
phillips
1e−03 37.4 10.7 10.7 1.0 1.0
5e−03 28.6 7.4 7.4 1.0 1.0
1e−02 25.0 6.8 6.8 1.0 1.0
5e−02 15.4 6.1 6.1 1.0 1.0
1e−01 12.8 6.1 6.1 1.0 1.0
projection of the original problem onto a lower-dimensional subspace. Indeed the size of the subproblem, j, can be thought
of as a regularization parameter and the hybrid LSQR combines use of two regularization parameters, j and σ . Consequently,
we expect that the actual required subproblem size will depend on the current value of σ . Specifically, when σ is very small,
λ is large and the LSQR solution of the augmented system contains less noise than if σ were larger. Thus, for a given problem,
the augmented solution by LSQR for smaller σ permits a solution on a larger subspace. Equivalently, the LSQR does not need
to introduce as much regularization in this case. Indeed, we found that the subproblem size does change dynamically with
σ increasing or decreasing. Although we could set an initial requirement of using a specific subproblem size for all σ , this
might lead to some additional unnecessary calculation. Therefore we permit the algorithm to choose j as a function of σ
each outer iteration, and present averages for the actual adopted size of the projected problem.
Benchmark problems, shaw, ilaplace, heat and phillips, from the Regularization Tool Box (Hansen, 1994) are
implemented for different noise levels. The parameter σ obtained using the iterative hybrid LSQR solution is contrasted with
that obtained by the direct GSVD solution. In examples using the toolbox the true solutions xtrue , and matrices A defining the
models, were obtained using the relevant function calls to the tool box and true measurements found from btrue = Axtrue .
To obtain noisy data sets the Matlab r
function randn (Marquardt, 0000), which yields standard normal data, was used
to obtain a perturbation matrix E of size m × 500, for b of length m. Each column el , l = 1:500, of E was normalized to
two norm length kbk by el = el kbk/kel k and the 500 noisy right-hand side vectors bl obtained as bl = b + el . Data sets
were generated for noise levels = .001, .005, .01, .05 and .1. Given the 500 samples we can then directly calculate the
covariance Cb of the measurements. This method generates a covariance matrix which is nearly diagonal, hence in this first
1/2
set of experiments we used the diagonal weighting matrix Cb = diag(σ1 , σ2 , . . . , σm ). Also, all experiments, other than
those with the problem shaw which uses D = I, used D to approximate the first derivative operator.
In this first set of experiments we use very small problems, problem size n = 64, and impose a very tight confidence
interval (13) using α = .9999, for tolerance .0014 on convergence of the Newton iteration. For the LSQR iterations we use
atol = 100 ∗ btol and btol = 10−9 , and allow the subproblem size to reach full size 64. At each outer Newton step the LSQR
algorithm adjusts the size of the projected problem, j(σ ) such that the LSQR algorithm meets the convergence criteria (37),
which may be different for each chosen σ (k) . Therefore, for each noise level we track the average of j(σ (k) ) over outer steps
k = 1, 2, . . . , k ≤ 15. For both direct and iterative solves we report the total number of calculations K of σ , equivalently
of problem solves, where the count includes the bracketing and subsequent Newton iterations. P-values for a paired t-test
between GSVD and hybrid LSQR results for both K and for σ are also given.
The results in Table 1, which are for the solution of F (σ ) = 0, using the unrealistic estimate x = x0 = 0, demonstrate
generally a very high correlation, p-value near 1, between GSVD and hybrid LSQR algorithms for both the average number
ARTICLE IN PRESS
Table 2
Comparison of the results using the hybrid LSQR and the GSVD based algorithm with centrality parameter obtained using the average of the right-hand
side vectors. Columns five and six are the p-values for paired t-tests between GSVD and LSQR results, for both the obtained σ and the number of σ updates
K . The first column is the noise level used in generating the noisy data. These results are for problem size 64, over 500 tests of each problem. The mean
relative errors are measured in the least squares norm as compared to the known exact solutions and, where given in parentheses, the failure count is the
number of problems which did not achieve relative error less than 50%.
shaw
Noise Average Iterations K p-value Relative error
Level j(σ (k) ) GSVD LSQR K σ GSVD LSQR
1e−03 10.5 31.1 31.2 0.8 1.0 0.067 0.067

5e−03 9.7 28.4 28.1 0.9 1.0 0.110 0.110
1e−02 8.7 29.4 29.2 0.7 1.0 0.135 0.135
5e−02 7.4 26.7 26.5 0.9 1.0 0.189 (2) 0.189 (2)
1e−01 6.9 26.5 25.8 0.8 1.0 0.215 0.215
ilaplace
1e−03 8.5 28.2 31.2 1.0 1.0 0.023 0.023
5e−03 6.5 26.8 30.6 1.0 1.0 0.053 (1) 0.053 (1)
1e−02 5.9 26.5 30.6 0.0 0.4 0.084 (1) 0.075 (1)
5e−02 4.7 25.6 30.8 0.9 1.0 0.161 (16) 0.162 (16)
1e−01 4.3 24.1 30.9 0.7 1.0 0.211 (48) 0.211 (47)
heat
1e−03 34.1 27.9 29.6 0.2 0.8 0.073 0.073 (1)
5e−03 28.8 27.3 29.6 0.9 1.0 0.133 (1) 0.133 (1)
1e−02 23.6 26.6 30.1 1.0 1.0 0.179 (1) 0.179 (1)
5e−02 14.8 26.1 30.3 1.0 1.0 0.304 (48) 0.304 (49)
1e−01 11.1 24.5 30.3 0.8 1.0 0.375 (158) 0.375 (158)
phillips
1e−03 22.0 33.4 34.5 1.0 1.0 0.017 0.017
5e−03 14.7 28.3 31.6 0.6 1.0 0.029 0.029
1e−02 12.1 27.5 30.9 1.0 1.0 0.039 0.039
5e−02 8.0 24.6 30.2 0.9 1.0 0.079 0.079
1e−01 7.0 25.4 30.5 0.9 1.0 0.112 0.112
of solution solves K and for σ . Overall the average size of the projected problem is found to be much smaller than the actual
problem size, and decreases with increasing noise level as anticipated. The exceptions, those where the correlation is not
high, namely low noise for the problem heat and high noise for ilaplace, can be explained by the regularizing properties
of the hybrid LSQR algorithm. Specifically, for the former case with high noise, the LSQR generates a small bidiagonal system,
hence introducing significant regularization itself by excluding noisy components of the solution. Consequently, it is likely
that the obtained optimal regularization parameter imposes less regularization than that which is needed for the GSVD
algorithm. On the other hand, in the latter case, the LSQR algorithm iterates almost to the actual full problem size of n = 64
and more outer regularization is needed.
Table 2 provides equivalent experiments to those in Table 1, but for the minimization of the discrepancy FC2 (σ ). Therefore,
the algorithm first finds the minimum and maximum values for σ as described in Section 3.3.1, and then carries out bisection
to find the minimal discrepancy. The iteration count includes all three stages. To simulate the use of average measurement
data, we form the average of the bl for the given noise level and use this as b. Here the average relative errors in the solutions
are also calculated, and the failure count is given, where a failure is indicated by a relative error greater than 50%. The relative
errors and failures increase with the noise level, but in general the results indicate that the algorithm is robust at finding
good regularization parameters. Also these results demonstrate that a high correlation for the obtained σ is not necessary
for achieving low relative error, see for example noise level .01 for problem ilaplace.
The results of this section reinforce the conclusions about the robustness of the χ 2 method for small-scale problems
when used with prior information b instead of x as presented in Mead and Renaut (2009).
4.2. Other regularization techniques
Extensive results contrasting the χ 2 method with a priori information x0 = x for the LC, GCV and UPRE techniques for
finding the regularization parameter were presented in Mead and Renaut (2009). There it was shown that the GCV was
not competitive in terms of robustness. Here therefore we only consider the LC and UPRE techniques and we present just
one set of experiments. The results given in Table 3 demonstrate that the new algorithm with the a priori information b is
competitive.
The problem size is 128 but all other settings are the same as for the experiments reviewed in Table 2, except that the
size of the projected problem is limited to 128, and the total sample size is 250 instead of 500. The failure counts and average
relative least squared errors are given. The entries in bold face in each row indicate the result with minimum average relative
ARTICLE IN PRESS
Table 3
Results are reported for problem size 128 over data sets each of size 250 for each noise level. The errors are relative least squares errors with respect to
the exact solution, and the failure counts are the number of solutions in each case which did not achieve a relative error less than 50%. The parameter
settings are set as in Table 2, except that the problem size of the projected problem is allowed to reach size 128, consistent with the given problem size.
NaN indicates that the relative error could not be calculated because the test failed for all samples. Minimum values are in bold face for each problem set.
shaw
Least squares error Failure count
GSVD LSQR LC UPRE GSVD LSQR LC UPRE
1e−03 0.098 0.098 0.076 0.099 1 1 0 33

5e−03 0.120 0.120 0.097 0.121 0 0 0 44
1e−02 0.199 0.180 0.183 0.216 0 0 0 44
5e−02 0.229 0.206 0.200 0.239 1 1 0 63
1e−01 0.276 0.236 0.241 0.259 5 4 0 71
ilaplace
1e−03 0.064 0.052 0.065 0.069 0 1 0 38
5e−03 0.091 0.075 0.077 0.088 0 0 0 34
1e−02 0.146 0.151 0.125 0.160 3 6 0 42
5e−02 0.200 0.204 0.166 0.195 22 18 0 71
1e−01 0.247 0.230 0.231 0.230 67 53 63 112
heat
1e−03 0.118 0.118 0.080 0.093 0 0 128 0
5e−03 0.152 0.152 0.114 0.127 0 0 157 0
1e−02 0.272 0.272 0.245 0.242 16 16 245 15
5e−02 0.327 0.324 NaN 0.305 46 40 250 21
1e−01 0.408 0.404 NaN 0.396 119 109 250 61
phillips
1e−03 0.025 0.025 0.025 0.040 0 0 0 4
5e−03 0.033 0.033 0.028 0.044 0 0 0 4
1e−02 0.065 0.065 0.047 0.095 0 0 0 18
5e−02 0.100 0.099 0.080 0.109 0 0 0 34
1e−01 0.137 0.136 0.123 0.139 2 2 0 28
error. Each method is competitive for some problem set, but less competitive for another problem set. It can also now be seen,
compare with Table 2, that the hybrid LSQR algorithm has the potential to outperform the direct solve using the GSVD, even
for this relatively small problem size of 128. We anticipate the GSVD to become less reliable as the problem size increases.
Moreover, the results for the hybrid LSQR are not tuned in any regard to its convergence parameters. This tuning is the topic
of the next set of results.
4.3. Characteristics of the hybrid LSQR algorithm
The experiments presented in this section further examine the convergence properties of the hybrid LSQR algorithm.
Problems shaw and phillips are solved for problem size n = 512 with an error level of 10%, the constraint j(σ ) ≤ 40
and different choices for btol. All other parameters are the same as in the previous section. The legends in Fig. 2 indicate
the values of btol and resulting least squares errors in the solutions. Plotted are the subproblem size j(σ (k) ) against σ (k) , for
increasing values of btol. In all cases the projected problem size is far smaller than the maximum limit of 40, which supports
the hypothesis that the hybrid LSQR will be cost effective for larger problems, and that the solution is not prejudiced by
the upper limit of 40 on the subproblem size. The actual values of σ obtained in each case for increasing btol are 3.1756,
3.1756, 3.2540, 28.5962, 750 000 and are .0016, .0016, .0207, 250 000, 750 000, for shaw and phillips, resp. Large values
of σ (λ is very small), indicate that for the given tolerances on the LSQR solution no additional regularization through the
regularization term is needed. The specific large value of σ is determined by the upper maximum on σ imposed by the
initial bracketing algorithm on σ . When btol is large the convergence criteria are met for a small projected problem size
j and the LSQR has already overregularized the solution. Therefore very little additional, or no, regularization is required.
But the obtained solution is actually less accurate because insufficient information was included in the projected problem.
The iteration has stopped too soon, and btol is too large. For smaller btol, the projected problem increases in size and the
converged value of σ is also smaller, i.e. more regularization is required.
The optimal solutions in each case, for btol = 10−4 and 10−5 for shaw and phillips, resp. are compared with solutions
using the LC and UPRE in Fig. 3, for which the errors are .197 and 1.198, and .076 and .501 for each problem resp., compare
the errors given in the figures for the hybrid LSQR, .174 and .043. Good solutions are obtained by the hybrid LSQR algorithm.
4.4. Seismic signal restoration
Here we present the result of deblurring real seismic data. A real data set of 48 seismic signals of length 3000 is considered.
The signal variance pointwise can therefore be calculated over all 48 signals. For this data the underlying point spread
ARTICLE IN PRESS
Problem shaw n = 512 Problem phillips n = 512

a 15 b 30
Size of projected subproblem 1e–07 0.110
Size of projected subproblem

25
1e– 07 0.247
1e–06 0.110
1e–06 0.246
1e– 05 0.043
10 20
1e–05 0.240
1e–04 0.119
1e– 04 0.174
1e– 03 0.124
1e–03 0.255 15
5 10
0 –4 –2 0 2
0 –4 –2 0 2
10 10 10 10 10 10 10 10
log10 (σ(j)) log10 (σ(j))
Fig. 2. Illustrating the dependence of σ on j(σ ) for problem with noise level 10% for increasing btol. The values in the legend are the noise level and
resulting relative least squares error for the solution. Subfigure (a) for problem shaw and (b) for phillips. Problem size 512.
Problem shaw n = 512 Problem phillips n = 512

a 3 b LSQR
LSQR 0.3
Lcurve
2.5 Lcurve
0.25 UPRE
UPRE
Exact
2 Exact 0.2
0.15
1.5
0.1
1
0.05
0.5
0
0 –0.05
–0.5 –0.1
0 100 200 300 400 500 0 100 200 300 400 500
Fig. 3. Illustrating the solution for the different methods, for the optimal choice of btol from the data given in Fig. 2 compared with LC and UPRE solutions.
Subfigure (a) for problem shaw and (b) for phillips. Problem size 512. Solution x is plotted against its index, at differing intervals, so as to distinguish
the solutions which are given by a solid line, circles, diamonds and x symbols, for the hybrid LSQR, L-curve, UPRE and exact solutions, respectively.
function, i.e. the signal blur described by matrix A, has been estimated numerically by an approach described in Stefan
(2008), and is not the focus of the analysis here. Instead our purpose is to show that the algorithms can be successfully used
on real data. Because there is no known true solution we estimate the reliability of the deblurring result by downsampling
the signals and restoring the signals at different resolutions, 2:1, 5:1, 10:1, 20:1, and 100:1, corresponding to using m = 1500,
600, 300, 150, and 30, resp. The signals are restored using UPRE and both versions of the χ 2 algorithm, i.e. with x0 = 0 and
root finding for F (σ ), as well as using the average b in the minimization of FC2 (σ ). The LC solution was not at all successful,
insufficient regularization was introduced, and the results are not reported. The average signal b is formed from the average
over all 48 signals. White noise weighting is used, calculated from the average covariance pointwise of all 48 signals.
The results which are illustrated in Fig. 4 indicate, that except for very low resolution, the results of all implementations
are consistent across resolutions. Indeed, the calculated values for σ are also consistent across resolutions for all three
algorithms. It is apparent that the χ 2 methods are more successful at removing noise from the underlying signal. For this
application it is important that spurious oscillations are removed but that the signals are not oversmoothed. The intent is
to identify major signal arrival times, and to distinguish the arrivals of different features in the signal, which correspond to
different geophysical reflections at the level of the core–mantle boundary in the Earth, and hence assist with interpretation of
the structure of the core–mantle boundary. With this in mind, we deduce that the solutions obtained through minimization
of the discrepancy FC2 (σ ) oversmooth the solution. Possibly this is due to the use of the overly smoothed prior information
b. Moreover, this suggests that it may indeed be preferable to in general use the simpler algorithm which seeks to solve
F (σ ) = 0 with prior information x0 = 0. The image deblurring example in the next section therefore uses only the root-
finding algorithm.
4.5. Image deblurring
We now turn to our ultimate goal of demonstrating the robustness of the hybrid LSQR method for solving large-scale ill-
posed problems. To implement the code for large-scale problems we take notice that calculations in the bidiagonalization
ARTICLE IN PRESS
UPRE Algorithm; White Noise LSQR Algorithm: White Noise

a 1 b 1.2
1500 points 1500 points
Normalized Maximum Amplitude 1

1
30 points 30 points
0.8
0.5
0.6
0.4
0
0.2
–0.5 –0.2
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Index of Original Signal of Length 3000 Index of Original Signal of Length 3000
Hybrid Central Algorithm: White Noise

c 1
1500 points
0.9 600 points
300 points
0.8 150 points
30 points
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 500 1000 1500 2000 2500 3000
Index of Original Signal of Length 3000
Fig. 4. (a) The UPRE solution, (b) the solution using root of F (σ ) (c) the solution using the minimum of FC2 (σ ). Regularization parameters are consistent
across resolutions: σ = 0.01005, σ = 0.00058 and σ = 0.000001 for each method, resp. Each solution is normalized to maximum height 1 and shifted
to align at the position of the maximum, as is standard in the seismic literature. The solutions are plotted against the index of the original signal of length
3000. One can immediately observe that the solutions lie virtually on top of each other as the resolution is refined, except for the coarsest sampling using
only 30 points.
require only matrix–vector multiplications. To accomplish this we use the object-oriented approach for evaluating
operations with matrices describing point spread functions provided in the RESTORE TOOLS (Nagy et al., 2004). We use
problem Grain which is provided as one of the examples, with blurring matrix A of effective size 2562 × 2562 , for a stacked
2D image of size 256 × 256. A noisy version of Grain is obtained by adding noise to btrue in the same way as for the earlier
test problems in Section 4.1. Because no prior information is actually available, and would not likely be available for large-
scale deblurring, we run the basic Newton algorithm for function F (σ ) with x0 = 0. In Fig. 5 we illustrate the original image,
its blurred noisy version and the best solution obtained with the basic LSQR algorithm and with the hybrid LSQR algorithm,
i.e. with additional regularization. This is for a case with 15% noise, normalized with k(btrue − b)k/kbk = .15. Because of
the problem size we use btol = 10−6 and atol = 10−4 . The best solution was in both cases obtained using the projected
problem of the size 15.
The results are comparable but the signal-to-noise ratios calculated for solving with increasing projected problem size
show that the regularization improves the basic LSQR solution, see Fig. 6. Indeed, these results confirm the semi-convergence
behavior of the LSQR iteration, and that the regularization stabilizes the process when the LSQR iteration converges to a
solution for which regularization is still required. This is illustrated clearly with the relative error plot in Fig. 6 and supports
the similar observation in Chung et al. (2008) where the weighted GCV regularizer was applied to the LSQR solution. Fig. 6(c)
illustrates dependence of j on the amount of regularization in the augmented problem; subproblem size j increases as σ
decreases, equivalently as λ increases and more smoothing is introduced. The two regularization parameters trade off, small
j corresponds to more regularization introduced by the LSQR step, while small σ corresponds to more regularization intrinsic
in the augmented problem. The results presented for this one problem are illustrative of experiments with different noise
levels, and additional problems also taken from the RESTORE TOOLS set. Finally, the presented results for smaller problems
justify our focus here on comparing our solution for the nonhybrid LSQR and the hybrid LSQR solutions, rather than with LC
and UPRE regularizers, for which the comparison would also be prohibitive in cost.
In these experiments with large-scale problems we found that the estimates of F (σ ) obtained from the projected problem
were not close enough for use in the χ 2 parameter choice algorithm, which is predicated on accurate estimates of this
ARTICLE IN PRESS
Original Grain Blurred Grain 15% Noise

a b
Grain 15%, j=15, Unregularized Grain 15% Noise, j(σ)=15,σ=1.98

c d
Fig. 5. Problem Grain with noise of 15% added. (a) the true solution, (b) the noisy blurred right-hand side, (c) the basic LSQR solution (d) the hybrid LSQR
solution.
underlying functional. Therefore we actually updated F (σ ) directly through the use of the updated solution x(σ ) rather than
using the estimate based on the residual of the projected problem. The cost of this additional step is minimal in relation to
the overall bidiagonalization step.
5. Conclusions and future work
The χ 2 approach for estimation of the regularization parameter arising in the solution of numerically ill-posed problems
by generalized Tikhonov regularization has been extended for large-scale problems through its combination with an
iterative LSQR algorithm. Numerical results validate the method as compared to the direct GSVD algorithm for a series
of test problems available in the literature. The utilization of the theory does rely on the use of some prior information, such
as an estimate of the expected value of the model parameters, x and an estimate of the covariance Cb for the measurement
variables b. On the other hand, the image deblurring example presented in Section 4.5 used only an estimate of the signal
covariance and no estimate of either x or b. This indicates that the technique can be useful even without this additional prior
information.
For the situation in which x0 6= x, for example x0 = 0, the theory was extended. In particular, modifying the basic
theory presented in Mead and Renaut (2009) yields the general non-central χ 2 distribution of the underlying functional.
A new algorithm combining Newton with bisection search for obtaining the regularization parameter in this case was
also developed and validated. While the numerical results with simulated data support the use of this more complicated
algorithm, the results for the seismic signal restoration and the image deblurring suggest that the algorithm could actually
be detrimental and lead to oversmoothing.
The theory has been modified when the underlying resolution matrix is ill-conditioned so that the resulting functional
is still a χ 2 random variable at optimum, but with reduced degrees of freedom. Utilization of this result for severely ill-
conditioned problems, and its possible extension to explain results in a basis other than the basis given by the GSVD
expansion, is a topic for future work.
There is considerable work in the statistics literature on the estimate of variance in measurement data without repeat
measurements (Thompson et al., 1991). A topic of future study is thus to utilize this theory so as to make the χ 2 approach
useful for data with limited experimental data. Additional study of the stabilizing effect of the regularization combined
with the LSQR solution is also needed. The hybrid LSQR approach presented here does stabilize the LSQR solution, but
as can be seen from Fig. 6, the stabilization for large-scale problems is limited. In this paper it is explicitly argued that
regularization while projecting is successful. While an implementation which uses the standard project then regularize hybrid
ARTICLE IN PRESS
SNR against Projected Problem Size Relative Error Against Projected Problem Size
a 4
Regularized
b 1.6
Unregularized
Regularized
3 1.4 Unregularized
Signal to Noise Ratio
2 1.2
Relative error
1 1 Grain Problem
15% Noise
0 0.8
Grain Problem
15% Noise
–1 0.6
–2 0.4
15 20 25 30 35 40 15 20 25 30 35 40
Problem size j(σ) Problem size j(σ)
c 5
4.5 Regularization Increases with

Projected Problem Size
4 Problem Grain 15%
3.5
3
σ
2.5
1.5
1
15 20 25 30 35 40
Problem size j(σ)
Fig. 6. (a) The signal-to-noise ratio for problem Grain with noise of 15% added for increasing subproblem size. Signal-to-noise ratio is calculated as
10 log10 (1/e), where e is the relative error kxtrue − x(σ )k/kxtrue k illustrated in (b). (c) shows σ in each case. (Recall the regularization parameter λ = 1/σ .)
is potentially more dependent on the stopping criteria of the projection than the approach here, it may still be interesting to
consider the robustness of the χ 2 -estimate of the regularization parameter in this case. Further modification of the method
may be related to the number of degrees of freedom in the subproblem and the choice of the stopping criteria for the
bidiagonalization process, and in particular on the loss of orthogonality occurring in the LSQR projection made very evident
in recent work described in Hnětynková et al. (2009). Future work will also consider the impact of preconditioning for
improving the algorithm.
Acknowledgements
The first author completed portions of this work while visiting the Institute of Computer Science at the Academy of
Sciences of the Czech Republic and the Department of Informatics and Mathematical Modeling at the Technical University
of Denmark. This work is supported by grants to the authors, for the first author NSF grants DMS 0513214 and DMS 0652833,
the second author by research project MSM0021620839 financed by MSMT, the Charles University in Prague and for the third
author by NSF grant EPS 0447689. We also thank the two referees whose comments helped with improving the clarity of the
paper, and particularly with respect to explaining the difference between the presented and standard hybrid LSQR methods.
References
Benito, M., Pea, D., 2007. Detecting defects with image data. Comput. Statist. Data Anal. 51 (12), 6395–6403. doi:10.1016/j.csda.2007.02.016.
Chung, J., Nagy, J., O’Leary, D.P., 2008. A weighted GCV method for Lanczos hybrid regularization. ETNA 28, 149–167.
Eldén, L., 1982. A weighted pseudoinverse, generalized singular values, and constrained least squares problems. BIT 22, 487–502.
Fierro, R.D., Golub, G.H., Hansen, P.C., O’Leary, D.P., 1997. Regularization by truncated total least squares. SIAM J. Sci. Stat. Comput. 18, 1225–1241.
Golub, G.H., van Loan, C., 1996. Matrix Computations, 3rd ed. John Hopkins Press, Baltimore.
Hanke, M, Hansen, P.C., 1993. Regularization methods for large-scale problems. Surv. Math. Indust. 3, 253–315.
ARTICLE IN PRESS
Hansen, P.C., 1989. Regularization, GSVD and truncated GSVD. BIT 6, 491–504.
Hansen, P.C., 1994. Regularization tools: A Matlab package for analysis and solution of discrete ill-posed problems. Numer. Algorithms 6, 1–35.
Hansen, P.C., 1998. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. In: SIAM Monographs on Mathematical Modeling
and Computation, vol. 4.
Hansen, P.C., Nagy, J.G., O’Leary, D.P., 2006a. Deblurring Images, Matrices, Spectra and Filtering. In: SIAM Fundamentals of Algorithms.
Hansen, P.C., Kilmer, M.E., Kjeldsen, R.H., 2006b. Exploiting residual information in the parameter choice for discrete ill-posed problems. BIT 46, 41–59.
Hnětynková, I., Plešinger, M., Strakoš, Z., 2009. Golub–Kahan iterative bidiagonalization and determining the size of the noise in the data.
http://www.cs.cas.cz/strakos/download/2009_HnPlSt.pdf.
Kilmer, M.E., O’Leary, D.P., 2001. Choosing regularization parameters in iterative methods for ill-posed problems. SIAM J. Numer. Anal. Appl. 22, 1204–1221.
Kilmer, M.E., Hansen, P.C., Español, M.I., 2007. A projection-based approach to general-form Tikhonov regularization. SIAM J. Sci. Comput. 29 (1), 315–330.
Marquardt, D.W., 1970. Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 12 (3), 591–612.
MATLAB is a registered mark of MathWorks, Inc., MathWorks Web Site. http://.mathworks.com.
Mead, J., 2008. Parameter estimation: A new approach to weighting a priori information. J. Inv. Ill-posed Problems 16 (2), 175–194.
Mead, J., Renaut, R.A., 2009. A Newton root-finding algorithm for estimating the regularization parameter for solving ill-conditioned least squares problems.
Inverse Problems 25, 025002. doi:10.1088/0266-5611/25/2/025002.
Nagy, J.G., Palmer, K, Perrone, L., 2004. Iterative methods for image deblurring: A Matlab object-oriented approach. Numer. Algorithms 36, 73–93.
Obereder, A., Scherzer, O., Kovac, A., 2007. Bivariate density estimation using BV regularisation. Comput. Statist. Data Anal. 51 (12), 5622–5634.
doi:10.1016/j.csda.2007.04.019.
O’Leary, D.P., 2001. Near-optimal parameters for Tikhonov and other regularization methods. SIAM J. on Sci. Comput. 23, 1161–1171.
O’Leary, D.P., Simmons, J.A., 1989. A bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems. SIAM J. Sci. Stat.
Comput. 2 (4), 474–489.
Paige, C.C., Saunders, M.A., 1982a. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software 8, 43–71.
Paige, C.C., Saunders, M.A., 1982b. ALGORITHM 583 LSQR: Sparse linear equations and least squares problems. ACM Trans. Math. Software 8, 195–209.
Qiu, P., 2008. A nonparametric procedure for blind image deblurring. Comput. Statist. Data Anal. 52 (10), 4828–4841. doi:10.1016/j.csda.2008.03.027.
Rao, C.R., 1973. Linear Statistical Inference and its Applications. Wiley, New York.
Rust, B.W., O’Leary, D.P., 2008. Residual periodograms for choosing regularization parameters for ill-posed problems. Inverse Problems 24, 034005.
Stefan, W., 2008. Total variation regularization for linear ill-posed inverse problems: Extensions and applications, Ph.D. Dissertation, Department of
Mathematics and Statistics, Arizona State University. http://mathpost.la.asu.edu/~stefan.
Thompson, A.M., Kay, J.W., Titterington, D.M., 1991. Noise estimation in signal restoration using regularization. Biometrika 78 (3), 475–488.
Vogel, C.R., 2002. Computational Methods for Inverse Problems. In: SIAM Frontiers in Applied Mathematics.

Param Estim Tikhonov

Uploaded by

Copyright:

Available Formats

Param Estim Tikhonov

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Param Estim Tikhonov

Uploaded by

Copyright:

Available Formats

ARTICLE IN PRESS

Computational Statistics and Data Analysis ( ) –

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

Regularization parameter estimation for large-scale Tikhonov

article info abstract

x̂ = argmin J (x) = argmin{kAx − bk2Wb + kD(x − x0 )k2Wx }, (1)

2.1. χ 2 distribution of the regularized functional

The solution of (1) with Wx = λ2 In , assuming invertibility

x̂(λ) = (AT Wb A + λ2 DT D)−1 AT Wb r + x0 = x(λ) + x0 , (3)

Renaut (2009), to reexpress this quadratic form.

and such that

Q = diag(µ1 , . . . , µp , In−p , Im−n ). (10)

Theorem 1 (Non-Central χ 2 Distribution of the Regularized Functional). Suppose

• that the invertibility condition (2) holds, and

Q̃ = diag(µ1 , . . . , µp , 0n−p , Im−n ). (11)

Equivalently, J ∼ χ (m − n + p, c ) has expected value m − n + p + c and variance 2(m − n + p) + 4c.

2.2. The truncated GSVD

where, generalizing (11),

Q̃ = diag(0p−r , µp−r +1 , . . . , µp , 0n−p , Im−n ). (20)

3.1. The Newton algorithm

The derivative is given by

3.2. The hybrid LSQR algorithm

αi gi = AT hi − βi gi−1 , kgi k = 1, (27)

AT Hj = Gj LTj , A Gj = [ Hj , hj+1 ] Lj+ . (29)

Then, when D = I, the least squares solution yj (σ ) of the augmented system

3.3. The algorithm with centrality parameter

3.3.1. Newton with bisection to minimize FC

4.1. Comparing hybrid LSQR and GSVD

1e−03 10.5 31.1 31.2 0.8 1.0 0.067 0.067

4.2. Other regularization techniques

1e−03 0.098 0.098 0.076 0.099 1 1 0 33

4.3. Characteristics of the hybrid LSQR algorithm

4.4. Seismic signal restoration

Problem shaw n = 512 Problem phillips n = 512

Size of projected subproblem

Problem shaw n = 512 Problem phillips n = 512

4.5. Image deblurring

UPRE Algorithm; White Noise LSQR Algorithm: White Noise

Normalized Maximum Amplitude 1

Hybrid Central Algorithm: White Noise

Original Grain Blurred Grain 15% Noise

Grain 15%, j=15, Unregularized Grain 15% Noise, j(σ)=15,σ=1.98

5. Conclusions and future work

4.5 Regularization Increases with

You might also like