combined hsi n msi

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/340727718
Regularizing Hyperspectral and Multispectral Image Fusion by CNN Denoiser
Article in IEEE Transactions on Neural Networks and Learning Systems · April 2020
DOI: 10.1109/TNNLS.2020.2980398
CITATIONS READS
151 415
3 authors, including:
Renwei Dian Xudong Kang

Hunan University Hunan University
25 PUBLICATIONS 1,598 CITATIONS 101 PUBLICATIONS 7,630 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Renwei Dian on 14 September 2021.
The user has requested enhancement of the downloaded file.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
Regularizing Hyperspectral and Multispectral Image

Fusion by CNN Denoiser
Renwei Dian, Student Member, IEEE, Shutao Li, Fellow, IEEE, and Xudong Kang, Senior Member, IEEE
Abstract—Hyperspectral and multispectral image fusion, owing to the limited sun irradiance, there is the ineluctable
which fuses a low-resolution hyperspectral image (LR-HSI) with trade-off among the spatial resolution, the spectral resolution,
a higher resolution multispectral image (MSI), has become a and signal-to-noise-ratio (SNR). In other words, HSI with
common scheme to obtain high-resolution HSI (HR-HSI). This
paper presents a novel hyperspectral and multispectral image high spectral resolution suffers from low spatial resolution
fusion method (called as CNN-Fus), which is based on the to guarantee a high SNR [9]. Therefore, spatial resolution
subspace representation and convolution neural network (CNN) enhancement is a basic problem for HSI imaging. Compared
denoiser, i.e., a well-trained CNN for gray image denoising. Our with HSIs, multispectral images (MSIs) with lower spectral
method only needs to train the CNN on the more accessible gray resolution can be obtained with much higher spatial resolution
images and can be directly used for any HSI and MSI datasets
without re-training. Firstly, to exploit the high correlations among and SNR [9]. Hence, a possible way to solve the problem is
the spectral bands, we approximate the desired HR-HSI with the image fusion, which combines a high-spatial-resolution MSI
low-dimensional subspace multiplied by the coefficients, which (HR-MSI) with a low-spatial-resolution HSI (LR-HSI) [10],
can not only speed up the algorithm but also lead to more [11]. The fusion technique has been widely used for HSI
accurate recovery. Since the spectral information mainly exists imaging due to its improved performance in many applications
in the LR-HSI, we learn the subspace from it via singular value
decomposition. Due to the powerful learning performance and including object classification [12], anomaly detection [13],
high speed of CNN, we use the well-trained CNN for gray image and change detection [14]. Experiments performed in reference
denoising to regularize the estimation of coefficients. Specifically, [13] demonstrates that the fused HR-HSI can really promote
we plug the CNN denoiser into the alternating direction method the detection accuracy.
of multipliers algorithm to estimate the coefficients. Experiments
demonstrate that our method has superior performance over the Recently, convolution neural network (CNN) has been suc-
state-of-the-art fusion methods. cessfully applied to solving the denoising problem for its
Index Terms—fusion, super-resolution, hyperspectral imaging,
high speed and good learning performance. Especially, Zhang
convolutional neural network. et al. [15] propose a flexible and fast CNN for denoising,
which can adaptively deal with noisy images of different
noise levels. Inspired by this work, we propose to use the
I. I NTRODUCTION well-trained CNN for gray image denoising to regularize
HSI-MSI fusion problem. In general, the proposed method
H YPERSPECTRAL imaging technique can obtain images
of various different spectral bands, simultaneously. The
high spectral resolution of hyperspectral images (HSIs) pro-
has two steps: subspace estimation and coefficients estima-
tion. To exploit the high correlations and redundancies in
vide faithful knowledge of the scene and enhance the perfor- the spectral mode, the HR-HSI is decomposed as a low-
mance of many applications, such as object classification [1]– dimensional spectral subspace and corresponding coefficients.
[6], anomaly detection [7], and disease diagnosis [8]. However, Since most of the spectral information exists in the LR-HSI,
we learn the spectral subspace from it via the SVD. With
the spectral subspace known, we propose to plug the CNN
This paper is supported by the Major Program of the National Natural denoiser into the alternating direction method of multipliers
Science Foundation of China (No. 61890962), the National Natural Science
Foundation of China (No. 61601179 and No. 6187119), the National Natural (ADMM) algorithm to estimate the coefficients. In the ADMM
Science Fund of China for International Cooperation and Exchanges (No. iteration, we mainly need to solve two subproblems. One
61520106001), the Fund of Hunan Province for Science and Technology sub-problem is the quadratic problem, which can be solved
Plan Project under Grant (No. 2017RS3024), the Fund of Key Laboratory
of Visual Perception and Artificial Intelligence of Hunan Province (No. analytically and efficiently. Another subproblem can be seen as
2018TP1013), the Natural Science Foundation of Hunan Province (No. the denoising problem, and we solve it efficiently by applying
2019JJ50036), the Portuguese Science and Technology Foundation under a CNN denoiser for each row of coefficients. Finally, we can
Projects UID/EEA/50008/2019, the Hunan Provincial Innovation Foundation
for Postgraduate, and the China Scholarship Council. (Corresponding author: acquire the desired HR-HSI with the coefficients and spectral
Xudong Kang.) subspace known. The proposed fusion method mainly has the
R. Dian is with College of Electrical and Information Engineering, and following advantages:
with the Key Laboratory of Visual Perception and Artificial Intelligence of
Hunan Province, Hunan University, Changsha, 410082, China, and also with
Instituto de Telecomunicacões, Instituto Superior Técnico, Universidade de (i) The proposed CNN denoiser method has good generaliz-
Lisboa, 1049-001 Lisbon, Portugal (e-mail: [email protected]). ability, and can flexibly deal with different HSI datasets
S. Li and X. Kang are with College of Electrical and Information En- without re-training.
gineering, and with the Key Laboratory of Visual Perception and Artificial
Intelligence of Hunan Province, Hunan University, Changsha, 410082, China (ii) Even though our target is HSI-MSI fusion, we do not
(e-mail: shutao [email protected]; xudong [email protected]). need to train the CNN on any HSI and MSI datasets. In
fact, the CNN denoiser is trained from more available Zhou et al. [30] and Veganzones et al. [31] emphasize the
gray images, and it is very ingenious to using the CNN HR-HSI is locally low-rank, the subspace and coefficients are
trained from gray images for HSI recovery. estimated for each local region, separately.
(iii) In comparison to the state-of-the-art fusion methods, our Tensor factorization based fusion methods deal with the
method has superior performance. three-dimensional HSI from the point of tensor. Work [32]
We arrange the remainder of this paper as follows. We firstly proposes a non-local sparse tensor factorization for
review recent works about HSI-MSI fusion in Section II. Our HSI-MSI fusion, where they approximate the HR-HSI by
method for fusion of LR-HSI and HR-MSI is proposed in dictionaries of three modes and a sparse core tensor based
Section III. In Section IV, we present experiments and the on Tucker decomposition [33]. To exploit the non-local self-
corresponding discussions. The conclusion is drawn in Section similarities, they assume that similar cubes can be sparsely
V. represented by the same dictionaries. Furthermore, Li et al.
[34] and solve the fusion problem by simultaneously con-
ducting sparse Tucker decomposition on the HR-MSI and
II. R ELATED W ORK
LR-HSI, where the core tensor and three dictionaries are
HSI and MSI fusion, which combines a LR-HSI with a alternatively updated until convergence. Kanatsoulis et al. [35]
HR-MSI, has become an effective way to obtain HR-HSI. propose a canonical polyadic decomposition based framework
Reference [16] gives a review of recent state-of-the-art fu- for the fusion problem, where the identifiability of the model
sion methods. The fusion approaches can be classified into is guaranteed under mild and realistic conditions. A weighted
four types: sparse representation based approaches, low-rank low-rank tensor recovery model is proposed by Chang et al.
representation based approaches, tensor factorization based [36] to solve the fusion problem, which gives the weighted
approaches, and deep learning based approaches. parameters to the elements of core tensor.
Sparse representation based HSI-MSI approaches assume The deep learning techniques, especially CNNs with deep
that each spectral pixel can be sparsely represented by the structure, have been successfully applied to many image
learned spectral dictionary. Reference [17] proposes to esti- processing tasks. In the filed of HR-MSI and LR-HSI fusion,
mate the over-complete spectral dictionary from the LR-HSI, Dian et al. [37] firstly use the imaging models to initialize the
and then obtains the coefficients by conducting sparse coding HR-HSI, and then map the initialized HR-HSI to ground truth
for HR-MSI. Akhtar et al. [18] firstly learn the non-negative by the well-trained CNN. In addition, Yang et al. [38] propose
dictionary, and then estimate coefficients for each patch si- the two-branches CNN for the fusion, which makes use of
multaneously via the simultaneous greedy pursuit approach. the two-branches CNN to extract the spectral features of each
Based on the learned non-local similarities in the HR-MSI, pixel from the LR-HSI and its spatial features from HR-MSI.
Dong et al. [19] propose a structured sparse coding approach These methods mainly have two disadvantages. Firstly, they
to estimate the coefficients. Han et al. [20] propose a sparse need additional HR-MSI, LR-HSI, and HR-HSI datasets for
representation method to learn the local self-similarities in pre-training, where these data are often not available. What
each super-pixel and non-local self-similarities among patches. is more, they do not have good generalizability, that is, the
Low-rank representation based methods exploit the low-rank CNN trained on one dataset can not directly be applied to
structure of the HR-HSI, and decompose it as the coefficients another dataset of different spectral size. Besides, Han et al.
and low-dimensional spectral basis, which exploits the high [39] propose the multi-branch BP neural networks to fuse the
correlations among the spectral bands. In this way, the target of LR-HSI and HR-MSI, which clusters the spectral bands for
fusion is transformed as the estimation of the coefficients and several groups for fusion.
spectral basis. Based on the linear spectral unmixing model,
the HSI can be decomposed as endmembers and abundances,
III. P ROPOSED CNN-F US A PPROACH
which has been used for single HSI super-resolution [21] and
fusion [22]–[24]. The unmixing based fusion methods [22]– In this section, our CNN-Fus approach for the fusion of
[24] treat the spectral basis as endmembers, and alternately HR-MSI and LR-HSI is presented. The proposed CNN-Fus
update the endmembers and coefficients from the HR-MSI approach has two steps: estimation of subspace and estimation
and LR-HSI by using the priors of spectral unmixing. The of coefficients.
subspace representation model is different from the spectral
unmixing model, where the subspace and coefficients are
A. Observation model
not necessarily non-negative, and coefficients do not need
to satisfy sum-to-one. Besides, the subspace is often semi- All of HR-HSI, HR-MSI, and LR-HSI can be naturally
unitary. Works [25]–[27] obtain the spectral basis by con- represented by the three-dimensional data. Here, we unfold
ducting the singular value decomposition (SVD) or vertex the three-dimensional data with the spectral mode and denote
component analysis [28] on the LR-HSI, and they use different them as the matrices. The HR-HSI is denoted as X ∈ RS×N ,
regularizer to estimate coefficients based on the maximum a where S and N = W × H are the number of bands and
posteriori [29]. For example, Simoẽs et al. [25] make use pixels, respectively. W and H are the lenghts of the two spatial
of total variation regularization to improve spatial piecewise modes.
smoothness. Wei et al. [27] use a sparse representation based The acquired LR-HSI is denoted by Y ∈ RS×n , where
regularizer to promote the self-similarities of image patches. n is the number of spectral pixels. Compared with X, Y is
Here, U1 and V1 are semi-unitary, and diagonal matrix Σ1

Subspace contains the singular values, which are arranged in non-
... representation ... increasing order. By only reserving L largest singular values,

we give a low-dimensional approximation of Y
Ŷ = Û1 Σ̂1 V̂1T (6)
X Subspace S Coefficients A
where Û1 = U1 (:, 1 : L) and V̂1 = V1 (:, 1 : L). The
Fig. 1. Subspace representation of the HSI. subspace S is equal to
S = Û1 = U1 (:, 1 : L). (7)
spatially downsampled, and the relationship of them can be
modelled as C. Estimation of Coefficients
Y = XBD + Nh , (1) With the subspace S known, we calculate the coefficients
via the maximum a posteriori (MAP) estimation. By combing
where Nh represents the additive Gaussian noise. B ∈ RN ×N equations (1), (3), and (4) in the manuscript, we can obtain
models a blurring operation of hyperspectral camera, and it is the following equation
a block-circulant. Therefore, we can decompose B as
argmin ||Y − SABD||2F + α||Z − RSA||2F + λφ(A), (8)
H
B = FKF , (2) A
where || · ||F denotes the Frobenius norm. In equation 8,

where F stands for the discrete Fourier transform (DFT)
||Y − SABD||2F + α||Z − RSA||2F is the the log-likelihood
matrix. The diagonal matrix K holds the eigenvalues of B
term term, and λφ(A) represents the prior on coefficients
in its diagonal line. D ∈ RN ×n is the spatially subsampling
A, where λ is the regularization parameter. Formulation (5)
matrix.
is based the assumption that the noise in HSI and MSI is
Z ∈ Rs×N stands for the HR-MSI. Compared with X, Z both i.i.d, the parameter α models different noise variances
is spectrally downsampled, and the relationship of them is in the two images. Hence, some prior information is needed
written as to regularize the estimation of A. The coefficients mainly
Z = RX + Nm , (3) reserve the spatial structures of the HR-HSI. Many handcraft
priors have been used for the estimation of the coefficients,
where R ∈ Rs×S denotes the spectral downsampling matrix including priors of spectral unmixing [22], [23], [41], sparse
of MSI sensor, and Nm represents the additive Gaussian noise. priors [17], [27], [32], low rank priors [42], non-local spatial
similarities [19], and spatial smoothness [25]. Image denoising
B. Subspace Estimation is a hot research topic in image processing. Many state-of-
the-art denoising algorithms have been proposed to solve the
Hyperspectral data normally has low-rank structure, and problem, such as BM3D [43], weighted nuclear norm [44],
therefore it lives in a low-dimension subspace [25], [40]. As and CNN denoiser [15], [45]. The recently proposed plug-
shown in Fig. 1, the HR-HSI can be written as and-play framework [46] makes it possible that we exploit
the state-of-the-art denoising algorithms to solve other image
X = SA + N, (4)
restoration problems. The plug-and-play framework plugs
where S ∈ RS×L and A ∈ RL×N are the subspace and a denoiser into an iterative algorithm, where the denoiser
coefficients, respectively, and N denotes additive noise. The is treated as the proximity operator. Instead of using the
subspace representation model mainly has three advantages: handcraft priors, we use the prior learned from the images
a) it fully exploits high correlations among the spectral bands. for the estimation of the coefficients. In specific, inspired
b) small values of L ( L < S), which reduces the size of by the spirit of plug-and-play, we use the well-trained CNN
spectral mode, and therefore makes computationally efficient; denoiser to regularize the estimation of coefficients. The CNN
c) The subspace is semi-unitary (ST S = I), and therefore we denoiser is trained from more available gray images, which can
have A = ST X. In this way, each row of A can be linearly effectively capture the instinct spatial structures of images and
expressed by the rows (bands) of X, and rows of A preserve remove the noise. The problem (8) is hard to solve directly.
the spatial structures of the X. The ADMM algorithm can solve (8) by decomposing it as
Based on the subspace representation, the target of the several treatable subproblems. By bring in variable V, we
fusion is transformed into estimating the subspace S and acquire the augmented Lagrangian function:
coefficients A. Since the spectral information mainly exists L(A, V, G) =||Y − SABD||2F + α||Z − RSA||2F
in the LR-HSI, we assume that the HR-HSI and LR-HSI share G 2 (9)
the same spectral subspace, and therefore the spectral subspace µ||V − A + || + λφ(V)
2µ F
can be estimated from the LR-HSI. We firstly conduct singular
value decomposition (SVD) on the LR-HSI, that is, where µ is the penalty parameter, and G is the Lagrangian
multiplier. Based on the ADMM algorithm, the problem (8)
Y = U1 Σ1 V1T (5) can be transformed as minimizing the augmented Lagrangian
Solve Update
CNN Denoiser
Sylvester equation Parameters
No If Yes
t==T
Fig. 2. The flowchart of the coefficients estimation.
function (9). As shown in Fig. 2, the coefficients are estimated

by iteratively updating A, V, G, and µ. Lemma 1 (Wei et al. [47]): The following equation is sat-
1) Updating of A: In each iteration, A is updated by isfied
1
minimizing L(A, V, G) with regard to it, i.e., FH DDH F = (1d ⊗ In )(1Td ⊗ In ), (17)
d
A ∈ argmin L(A, V, G) = argmin ||Y − SABD||2F +
A A where 1d ∈ Rd is a vector of ones and In ∈ Rn×n is the
G 2 identity matrix. Here, n is number of pixels of LR-HSI, and
α||Z − RSA||2F + µ||V − A + || d= N
2µ F n.
(10) By combining (16) and (17), we can acquire the following
equation
where problem (10) is strong convex. Therefore, we force the ΛĀ + ĀM = C, (18)
derivative of (10) with respect to A to be 0, and acquire the
Sylvester equation where Ā = Q−1 1 H
1 AF, M = d K̄K̄ , K̄ = K(1d ⊗ In ), C =
−1
Q1 H3 F. Equation (18) is the Sylvester equation with regard
H1 A + AH2 = H3 , (11) to Ā. We rewrite the diagonal matrix K as
Since the spectral basis S is acquired by the SVD, it satisfies 
K1 0 · · · 0

ST S = IL , where IL ∈ RL×L is the identity matrix.  0 K2 · · · 0 
Therefore, we can obtain K= .

..

..  (19)
 .. . . 
H1 = α(RS)T RS + µIL 0 0 ··· Kd
H2 = (BD)(BD)T (12) Pd
G where Ki ∈ Cn×n . Therefore, we have K̄K̄H = t=1 K2d .
T T T
H3 = α(RS) Z + S Y(BD) + µ(V + ), The equation (18) can be solved in an row-by-row manner.
2µ
We first rewrite Ā and C as Ā = [ā1 , ..., āL ]T and C =
Since the matrices H1 and H2 are positive, the system matrix [c1 , ..., cL ]T , respectively, where āi and ci represent i-th row
of (11) is positive, and therefore we can apply the conjugate of Ā and C, respectively. In this way, Ā can be estimated
gradient method to solve it. Here, we use a fast method to row-by-row manner, i.e.,
solve equation (11) analytically and efficiently [47], [48]. The
matrix H1 is symmetric and positive, and therefore it can be λi āi + āi M = ci , for i = 1, ..., L (20)
diagonalized by eigendecomposition, i.e., We can obtain
H1 = Q1 ΛQ−1
1 . (13) āi = ci (λi In + M)−1 , for i = 1, ..., L (21)
where diagonal matrix Λ is written as d
By using K̄K̄H = t=1 K2d , (λi In +M)−1 can be computed
P
as
 
λ1 0 · · · 0
 0 λ2 · · · 0  d
X
Λ= . , (14) (λi I+M)−1 = λ−1 −1
K2d )−1 K̄H (22)
 
.. .. i In −λi K̄(λi dIn +
 .. . . 
t=1
0 0 · · · λL
Hence, āi is equivalent as
and matrix Q1 is invertible. Left multiplying (11) by Q−1
1 , we d
can obtain
X
āi = λ−1
i ci − λ−1
i ci K̄(λi dIn + K2d )−1 K̄H (23)
ΛQ−1
1 A + Q−1
1 AH2 = Q−1
1 H3 , (15) t=1
After obtaining Ā, the coefficients A is estimated as

Right multiplying (15) with the DFT matrix F and combining
equation (2), the following equation is acquired A = Q1 ĀFH (24)
ΛQ−1 −1 H H H −1
1 AF + Q1 AFKF DD FK = Q1 H3 F, (16) Algorithm 1 outlines the method for solving the equation (11).
Algorithm 1 A solution for (11) w.r.t. A in a row-by-row manner via the learned map F, i.e.,
Input: H1 , H2 , H3 , S, B λ
1: B = FKFH ; V(i, :) = F(H(i, :), ; Θ), for i =, 1, 2, ..., L, (26)
2µ
2: K̃ = K(1d ⊗ 1n );
−1 where Θ is the parameters of the FFDNet, and the variable
3: H1 = Q1 ΛQ1 ;
−1 G
4: C = Q1 H3 F; H satisfies H = A − 2µ . Since each row of H is not an
5: for i = 1 to L do image, we need firstly to scale each row of it to [0, 1] and
d
then reshape it as the matrix Ĥi of size W × H, where W
āl = λ−1 −1
K2t )K̄H ;
P
6: l cl − λl cl K̄(λl dIn +
t=1 and H are the dimensions of two spatial modes. The above
7:end for operation is represented by Ĥi = vec−1 (ci H(i, :) + bi ). Here,
8:Set A = Q1 ĀFH ; vec−1 (·) denotes the inverse operation of vectorization, and
Output: A. transforms a vector a ∈ RW H into a matrix B ∈ RW ×H ,
where B ∈ RW ×H = vec−1 (a) is equivalent as B(i, j) =
a((j − 1) ∗ W + i). In this way, the noise level is changed as
2) Updating of V: In each iteration, V is updated by ci σ 2 , correspondingly. Finally, we need to scale the denoising
minimizing the Lagrangian function with regard to it, leading results back. Therefore, the estimation of V in equation (26)
to, is transformed as
V ∈ argmin L(A, V, G) = Ĥi = reshape(ci H(i, :) + bi , W, H), for i =, 1, 2, ..., L,

V ci λ
G 2 (25) V̂(i, :) = reshape(F(Ĥi , ; Θ), 1, N ), for i =, 1, 2, ..., L,
argmin µ||V − A + || + λφ(V) 2µ
V 2µ F V̂(i, :) − bi
V(i, :) = , for i =, 1, 2, ..., L,
Inspired by the spirt of plug and play, the optimization problem ci
(25) can be seen as the denoising of A − 2µ G
with a white (27)
2 λ
additive Gaussian noise of variation σ = 2µ . Many state-of- 1
where ci = max(H(i,:))−min(H(i,:)) and bi = −ci ∗ min(H(i, :)).
the-art algorithms have been proposed to solve the denoising 3) Updating of Lagrangian multiplier G: The Lagrangian
problem, such as Block-Matching and 3D filtering [43], K- multiplier G is updated via
SVD denoising method [49], weighted nuclear norm method
[50], and deep CNN based denoising method [15], [51]. G = G + 2µ(V − A) (28)
Due to the high speed and powerful learning performance of 4) Updating of penalty parameter µ: The penalty parameter
CNN, we take the recently proposed FFDNet [15], a flexible µ has an important effect on the FFDNet denoising process,
and fast CNN based method, as the denoising engine. The where the input variance σ 2 = 2µ λ
. With the iteration of the
FFDNet consists of three kinds of operations: 3×3 convolution algorithm, the image is more close to the clean image, and
layer (Conv), Batch Rectified Linear Units (ReLU) [52], therefore the input noise level σ 2 should be also turned down.
and Normalization (BN) [53]. In specific, the first layer is In this way, we need to increase the value of µ in the iteration.
”Conv+ReLU”, the middle layer is ”Conv+BN+ReLU”, and Another advantage of increasing µ is that it can make the
the last layer is ”Conv”, where BN is exploited to speed up algorithm converge. The penalty is update by
the training procedure, and ReLU max(0, x) is the activation
function. Using FFDNet as denoising engine mainly has three µ = γµ (29)
advantages. Firstly, FFDNet uses a tunable noise level map where γ is the constant satisfying γ > 1.
as input, and therefore the FFDNet can flexibly deal with We summarize our method for HSI-MSI fusion in Algorith-
images of different noise levels without re-training. Besides, m 2. The algorithm is stopped when the number of iterations
the FFDNet decomposes the noisy image as four sub-images, is reached a preset value T , and we set T = 12 in the
and use these sub-images as the input, which can reduce the experiments.
number of layers and make the algorithm much faster. Then,
the denoised sub-images are aggregated to acquire the final Algorithm 2 CNN-Fus based HSI-MSI fusion
denoised image. What is more, we do not need to train the
Input: Y, Z, λ
FFDNet on any HSI or MSI datasets and is trained on more
1: Estimate the subspace S via (7);
available gray images. It is very ingenious to using CNN
2: while not converged do
trained from gray images for HSI recovery.
3: Update A via Algorithm 1;
Since the subspace E is semi-unitary, we have A = ST X,
4: Update V via equation (27);
and each row of A can be linearly expressed by rows of X.
5: Update G via equation (28);
Therefore, each row of A preserves the spatial structures of the
6: Update µ via equation (29);
HR-HSI. Besides, even though the bands of HR-HSI are highly
7: end while
correlated, the rows of coefficients are much less correlated
8: X = SA;
due to the subspace representation. Based on this conclusion,
G Output: X.
we apply the well-trained FFDNet to each row of A − 2µ to
solve the problem (25). In other words, we estimate V in (25)
Pavia Univer sity Cupr ite Mine Pavia Univer sity Cupr ite Mine
44 45
43
44
42
43
41
PSNR
PSNR
40 42
39
41
38
40
37
36 39
1 3 5 7 9 11 13 4 6 8 10 12 14 16
T L
Fig. 3. The effect of parameter T on our method. Fig. 4. The effect of parameter L on our method.
Pavia Univer sity Cupr ite Mine

IV. E XPERIMENTS 45
In this section, experiments of HSI-MSI fusion are conduct- 44

ed on two simulated datasets and one real dataset to evaluate 43
the effectiveness of our method. The source code will be 42
available in https://sites.google.com/view/renweidian.
PSNR
41
40
A. Experimental Datasets 39
38
1) Pavia University: This HSI is acquired over the urban
37
area of Pavia University [54]. It has the size of 610×340×115,
where the number of bands is 115. Since some bands have low 36
0 4 8 12 16 20 24
SNR, these bands are removed and 93 bands are persevered.  104
The HSI is used as a reference image. The LR-HSI is simulat-
ed by using an 7 × 7 Gaussian filter with standard deviation 2 Fig. 5. The effect of parameter λ on our method.
and then by subsampling every 5 pixels in two spatial modes.
We simulate the HR-MSI Z by filtering X with the IKONOS-
like reflectance spectral response. The i.i.d Gaussian noise is Bayesian Sparse fusion method (Fuse-S). The NSSR, CSU,
added to HR-MSI (35dB) and LR-HSI (30dB). and CSTF belong to the sparse representation based methods,
2) Cuprite Mine: The Cuprite Mine is acquired by the low-rank representation based methods, and tensor factoriza-
AVIRIS [55] in Nevada. The HSI has a size of 512×512×224. tion based methods, respectively. We tune the parameters
The HSI covers wavelength range 400nm - 2500nm with of the compared methods for the best performance. For the
10nm interval. The bands 1-2, 105-115, 150-170 and 223-224 NSSR, the number of atoms in spectral dictionary K, non-
have water absorptions and low SNR, and they are removed. local similarities regularization parameter η1 , and sparsity
The LR-HSI is simulated by using an 7 × 7 Gaussian filter regularization parameter η2 are set as K = 75, η1 = 10−4 ,
with standard deviation 2 and then by subsampling every 4 and η2 = 10−4 , respectively. For the CSU, the number of
pixels. Six bands of wavelengths 480, 560, 660, 830, 1650 iterations is set as 1500. For the CSTF, the parameters are set
and 2220nm are directly selected as the HR-MSI for they as nw = 500, nh = 500 and nw = 500, nh = 300 for Cuprite
correspond to the visible and mid-infrared range spectral bands Mine and Pavia University, respectively. The other parameters
of USGS/NASA Landsat7 satellite. The i.i.d Gaussian noise of CSTF are set as ns = 15, β = 0.01, and λ = 10−5 . All
is added to HR-MSI (35dB) and LR-HSI (30dB). codes of compared methods are publicly available.
B. Compared Methods C. Quantitative Metrics

Three recent state-of-the-art fusion approaches are used We use four quantitative metrics to measure the quality of
for comparison, which includes the non-negative structured the recovered HR-HSIs.
sparse representation (NSSR) [19]1 , coupled spectral unmix- 1) PSNR: The first quantitative metric is peak signal to
ing (CSU) [23]2 , coupled sparse tensor factorization (CSTF) noise ratio (PSNR). The PSNR is extended for HSI is by
[34]3 , coupled non-negative matrix factorization (CNMF), and computing average PSNR of all bands.
1 http://see.xidian.edu.cn/faculty/wsdong 2) SAM: The spectral angle mapper (SAM) [56] is the
2 https://github.com/lanha/SupResPALM average angle between the estimated and referenced spectral
3 https://sites.google.com/view/renweidian pixels.
TABLE I
3) UIQI: The Universal Image Quality Index (UIQI) [57] Q UANTITATIVE METRICS OF THE COMPARED APPROACHES ON THE PAVIA
for HSI is calculated on the a sliding window, and then is U NIVERSITY [54].
averaged on all bands and all windows. For two windows a
and b, the UIQI is defined as Pavia University [54]
Method
PSNR SAM UIQI SSIM
2
4µa µb σa,b Best Values +∞ 0 1 1
UIQI(a, b) = , (30) NSSR [19] 39.455 3.520 0.979 0.968
µ2a + µb σa2 + σb2
2
CSU [23] 40.607 2.671 0.986 0.980
where µa and σa denote the mean value and standard variance CSTF [34] 41.468 2.554 0.988 0.979
2 CNMF [22] 42.554 2.379 0.990 0.984
of a, respectively, and σa,b stands for the covariances between
a and b. Fuse-S [27] 42.994 2.284 0.991 0.986
CNN-Fus 43.017 2.235 0.992 0.987
4) SSIM: The structural similarity index (SSIM) [58] is
used to measure the structural similarities of the gray image.
The SSIM is extended to evaluate the qualities of HSI by TABLE II
averaging on all spectral bands. Q UANTITATIVE METRICS OF THE COMPARED APPROACHES ON THE
C UPRITE M INE [55].
D. Parameters Selection
Cuprite Mine [55]
The proposed method has three important parameters, i.e., Method
PSNR SAM UIQI SSIM
the maximum number of iterations T , the dimensional of the Best Values +∞ 0 1 0
spectral subspace L, and the regularization parameter λ. NSSR [19] 38.912 1.730 0.866 0.929
The estimation of the coefficients A is the iterative process. CSU [23] 42.279 1.283 0.929 0.960
The maximum number of iterations T is an important stop CSTF [34] 41.038 1.262 0.915 0.950
criterion of our method. In T -th iteration, we obtain the CNMF [22] 42.200 1.279 0.924 0.959
Fuse-S [27] 43.694 1.153 0.942 0.968
coefficients A(T ) and estimated HR-HSI Z(T ) = SA(T ) . To CNN-Fus 44.008 1.144 0.944 0.970
show the results of the intermediate steps, Fig. 3 shows the
PSNR of estimated HR-HSI varies from T . We can see from
Fig. 3 that the PSNR for Pavia University has a sharp rise
when T varies from 1 to 7, and then it keeps relatively stable. approaches on Pavia University are reported in Table I. We
The PSNR for Cuprite Mine rises as T changes from 1 to 10, highlight the best results in bold for clarity. From Table I,
and then does not have obvious change. Therefore, we set the we observe that the proposed method and Fuse-S consistently
maximum number of iterations as 12 for our method. outperforms the other testing approaches in terms of the
The parameter L controls the dimension of the spectral quality metrics. The superiority of our method mainly comes
basis, which can highly influence the final result. To test the from the low-dimensional subspace representation and CNN
effect of L, we plot the PSNR as a function of L in Fig. 4. It denoiser, where the low-dimensional subspace representation
can be seen from Fig. 4 that the PSNR for Pavia University has can effectively model the correlations among the spectral
a sharp increase as L varies from 4 to 6, and then the PSNR bands, and the CNN denoiser can well depict the spatial prior
keeps relatively stable. The PSNR for Cuprite Mine increases of the HSI. The reconstructed Pavia University at 20-th and 50-
when L ranges from 4 to 8, and then it decreases when L is th bands by the CSTF, CNMF, and Fuse-S, and CNN-Fus and
bigger than 8. Therefore, we set the dimension of spectral basis corresponding error images are shown in Fig. 6. The images
L = 8 for the best performance, which indicates that only 8 reconstructed by NSSR and CSU are not shown, since they
atoms are enough to represent the spectral information, and perform relatively worse in this dataset. From the recovered
the spectral vectors really live in a low-dimensional subspace. HR-HSIs, all testing methods perform well in recovering the
λ
Since the noise level is σ 2 = 2µ in each iteration, the pa- details compared with the observed LR-HSI, which indicates
rameter λ highly influences the performance of CNN denoiser. the effectiveness of these methods. The error images reflect the
To discuss the influence of λ, we plot the PSNR for Pavia differences between estimated images and ground truths. Form
University and Cuprite Mine as a function of λ in Fig. 5. The the error images, we can see that the HR-HSIs recovered by
parameter λ = 0 means that the CNN denoiser is not used for our method has fewer errors and are more close to the ground
the estimation of the coefficients. We can see from Fig. 5 that truths.
the PSNR for Pavia University and Cuprite simultaneously has The quality matrices of the HR-HSIs reconstructed by
an obvious increase for λ grows from 0 to 8 × 10−4 , which compared approaches on Cuprite Mine are shown in Table III.
indicates that our method highly relies on the estimation of the Our method still delivers the best results among the testing
coefficients, and the CNN denoiser really helps the estimation approaches, and the Fuse-S takes the second place. In Fig.
of the coefficients. The PSNR for Pavia University and Cuprite 7, the fusion results and corresponding error images of CSU,
decreases as λ is bigger than 1.2 × 10−3 . Therefore, we set CNMF, Fuse-S, and CNN-Fus for Cuprite Mine at 50-th and
λ = 1 × 10−3 for both Pavia University and Cuprite Mine. 90-th bands are shown. The images reconstructed by NSSR
and CSTF are not shown, since they perform relatively worse
E. Experimental Results in this dataset. As we can observe from Fig. 7, the HR-HSIs
1) Experimental Results on Simulated Data Fsuion: The reconstructed by our method and Fuse-S still has fewer errors
quality matrices of the HR-HSIs reconstructed by compared and higher PSNR.
(a) LR-HSI (b) CSTF [34] (c) CNMF [22] (d) Fuse-S [27] (e) CNN-Fus (f) Ground truth
0 1 2 3 4 5 6 7 8 9 10
Fig. 6. The first and second rows show the fusion results and corresponding error images by the CSTF (42.525dB), CNMF (43.308dB), and Fuse-S
(44.694dB), and CNN-Fus (44.806dB) at 20-th band of Pavia University, respectively. The third and fourth rows show the fusion results and corresponding
error images by the CSTF (41.361dB), CNMF (43.355dB), Fuse-S (42.945dB), and CNN-Fus (44.192dB) at 50-th band of Pavia University, respectively.
(a) LR-HSI (b) CSU [23] (c) CNMF [22] (d) Fuse-S [27] (e) CNN-Fus (f) Ground truth
0 1 2 3 4 5 6 7 8 9 10
Fig. 7. The first and second rows show the fusion results and corresponding error images by the CSU (44.933dB), CNMF (43.887dB), and Fuse-S (46.395dB),
and CNN-Fus (46.942dB) at 50-th band of Cuprite Mine, respectively. The third and fourth rows show the fusion results and corresponding error images by
the CSU (40.904dB), CNMF (42.043dB), and Fuse-S (43.208dB), and CNN-Fus (43.147dB) at 90-th band of Cuprite Mine, respectively.
2) Experimental Results on Real Data Fusion: To further where λb φb (B) and λr φr (R) are the regularization term on
assess the performance of our method, we have tested the B and S, respectively. To smooth the blur matrix B, φb (B)
compared methods on real LR-HSI and HR-MSI fusion. The is set as φb (B) = ||Dh B||2F + ||Dv B||2F , where Dv and Dv
LR-HSI is captured by the Hyperion sensor loaded on Earth calculate the horizontal and vertical differences of B. Since
Observing-1 satellite. The LR-HSI has the spatial resolution we only need to smooth the spectral response matrix along
of 30m and 220 spectral bands in the spectral range of 400- vertical mode (row), φr (R) is set as φb (B) = ||Dv B||2F .We
2500nm. After removing the bands of low SNR, 89 bands are first estimate R with the strong spatial blur, and then esti-
retained. An area of spatial size 100 × 100 is used for exper- mate B with known R. Fig. 8 shows the false-color image
iments. The HR-MSI is taken by the Sentinel-2A satellite. It consisting of 16th, 5th, and 2nd bands of recovered HR-
has 13 spectral bands, and we use the four bands with 10m HSIs. The recovered images by CSTF and CNMF are not
spatial resolution for the fusion. The central wavelengths of shown, since they perform worse in this dataset. As shown in
the four bands are 490 nm, 560 nm, 665 nm, and 842 nm. the figure, all testing approaches can obviously improve the
The spatial size of the HR-MSI is 300 × 300. We use the spatial resolution of the observable LR-HSI, and the fusion
method proposed by Simoẽs et al. [25] to estimate B and R results of CNN-Fus have much fewer flaws. The CSTF is
simultaneously. From the observation models X(3) = Z(3) BS applicable to the case that the blur is decomposable in two
and Y(3) = RZ(3) , we get the equation RX(3) = Y(3) BS. spatial dimensions, and the estimated blur dose not satisfy the
Therefore, B and S are estimated by solving the following case. Hence, the fusion results of CSTF have obvious artifacts.
equation minB,S ||RX(3) − Y(3) BS||2F + λb φb (B) + λr φr (R)
(a) LR-HSI (b) HR-MSI (c) NSSR [19] (d) CSU [23] (e) Fuse-S [27] (f) CNN-Fus
Fig. 8. The figure shows the Hyperion LR-HSI, Sentinel-2A HR-MSI, and false-color images consisting of 16th, 5th, and 2nd bands of the recovered
HR-HSIs. (a) Hyperion LR-HSI. (b) Sentinel-2A HR-MSI. (c) NSSR [19]. (d) CSU [23]. (e) CSTF [34]. (f) CNN-Fus.
TABLE III
T HE RUNNING TIME IN SECONDS OF THE TESTING METHODS . ACKNOWLEDGEMENT
We would like to thank professor Jose M. Bioucas-Dias to
Dataset provide constructive suggestions for this work. We also would
Method
Pavia University Cuprite Mine Hyperion like to thank the editors and reviewers for their insightful
NSSR [19] 217 481 101
CSU [23] 455 505 10
comments and suggestions.
CSTF [34] 419 759 219
CNMF [22] 115 207 45 R EFERENCES
Fuse-S [27] 662 770 319
CNN-Fus 110 518 42 [1] N. Akhtar and A. Mian, “Nonparametric coupled bayesian dictionary and
classifier learning for hyperspectral classification,” IEEE Trans. Neural
Netw. Learn. Syst., vol. 29, no. 9, pp. 4038–4050, Sep. 2018.
[2] Q. Wang, J. Lin, and Y. Yuan, “Salient band selection for hyperspectral
image classification via manifold ranking.” IEEE Trans. Neural Netw.
F. Computational Efficiency Learn. Syst., vol. 27, no. 6, pp. 1279–1289, Jun. 2016.
[3] P. Zhong and R. Wang, “Jointly learning the hybrid CRF and MLR
To compare the computational efficiency of the testing model for simultaneous denoising and classification of hyperspectral
methods, the running time of testing approaches on the Pavia imagery,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 7, pp.
1319–1334, Jul. 2014.
University, Cuprite Mine, and Hyperion dataset is reported in
[4] J. Peng, L. Li, and Y. Y. Tang, “Maximum likelihood estimation-
Table III. All experiments are conducted at Matlab R2018b based joint sparse representation for the classification of hyperspectral
and computer equipped with 8-GB random access memory remote sensing images,” IEEE Trans. Neural Netw. Learn. Syst., 2018,
and Intel Core-i5-9300H CPU with 2.4-GHz. As can be seen doi:10.1109/TNNLS.2018.2874432.
[5] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data:
from the table, the CNN-Fus has obvious speed advantage A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens.
on the Pavia University. The speed advantage of our method Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016.
mainly comes from the subspace representation, which can [6] L. Zhang, L. Zhang, B. Du, J. You, and D. Tao, “Hyperspectral image
unsupervised classification by robust manifold matrix factorization,” Inf.
largely reduce the size of HSI data. Besides the CNMF is the Sci., vol. 485, pp. 154–169, Jun. 2019.
fastest method on Cuprite Mine. [7] X. Kang, X. Zhang, S. Li, K. Li, J. Li, and J. A. Benediktsson,
“Hyperspectral anomaly detection with attribute and edge-preserving
filters,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 10, pp. 5600–
5611, Oct. 2017.
V. C ONCLUSIONS [8] H. Akbari, Y. Kosugi, K. Kojima, and N. Tanaka, “Detection and analysis
of the intestinal ischemia using visible and invisible hyperspectral
We propose a new HSI-MSI fusion method, which is based imaging,” IEEE Trans. Biomed. Eng., vol. 57, no. 8, pp. 2011–2017,
on the subspace representation and CNN denoiser. Firstly, to Aug. 2010.
[9] N. Yokoya, C. Grohnfeldt, and J. Chanussot, “Hyperspectral and mul-
exploit the high correlations among the spectral bands, we tispectral data fusion: A comparative review of the recent literature,”
approximate the desired HR-HSI with the low-dimensional IEEE Geosci. Remote Sens. Mag., vol. 5, no. 2, pp. 29–56, Jun. 2017.
subspace multiplied by the coefficients, which can not only [10] C. Kwan, J. H. Choi, S. H. Chan, J. Zhou, and B. Budavari, “A super-
resolution and fusion approach to enhancing hyperspectral images,”
speed up the algorithm but also more accurate recovery. Since Remote Sens., vol. 10, no. 9, p. 1416, 2018.
the LR-HSI preserves most of the spectral information, the [11] J. Zhou, C. Kwan, and B. Budavari, “Hyperspectral image super-
subspace is learned from it via singular value decomposition. resolution: A hybrid color mapping approach,” J. Appl. Remote Sens.,
vol. 10, no. 3, p. 035024, 2016.
Due to the powerful learning performance and high speed [12] L. Gmez-Chova, D. Tuia, G. Moser, and G. Camps-Valls, “Multimodal
of CNN, we use the well-trained CNN to regularize the classification of remote sensing images: A review and future directions,”
estimation of coefficients. Specifically, we propose to plug Proc. IEEE, vol. 103, no. 9, pp. 1560–1584, Sep. 2015.
[13] Y. Qu, H. Qi, B. Ayhan, C. Kwan, and R. Kidd, “Does multispec-
the CNN denoiser into the ADMM iteration to estimate the tral/hyperspectral pansharpening improve the performance of anomaly
coefficients. Experiments on both simulated and real data detection?” in IEEE Int. Geosci. Remote Sens. Symp., Jul. 2017, pp.
fusion demonstrate the superiority of the proposed approach 6130–6133.
[14] V. Ferraris, N. Dobigeon, Q. Wei, and M. Chabert, “Robust fusion
over existing state-of-the-art HSI-MSI fusion approaches. of multiband images with different spatial and spectral resolutions for
The proposed subspace representation and CNN denoiser change detection,” IEEE Trans. Comput. Imag., vol. 3, no. 2, pp. 175–
based framework can be easily used for other HSI restoration 186, Jun. 2017.
[15] K. Zhang, W. Zuo, and L. Zhang, “FFDNet: Toward a fast and flexible
tasks, such as HSI denoising and compressed sensing, and is solution for CNN-based image denoising,” IEEE Trans. Image Process.,
expected to obtain good performance. vol. 27, no. 9, pp. 4608–4622, Sep. 2018.
[16] L. Loncan, L. B. De Almeida, J. M. Bioucas-Dias, X. Briottet, J. Chanus- [38] J. Yang, Y. Zhao, and J. Chan, “Hyperspectral and multispectral image
sot, N. Dobigeon, S. Fabre, W. Liao, G. A. Licciardi, M. Simoes et al., fusion via deep two-branches convolutional neural network,” Remote
“Hyperspectral pansharpening: A review,” IEEE Geosci. Remote Sens. Sens., vol. 10, no. 5, p. 800, 2018.
Mag., vol. 3, no. 3, pp. 27–46, 2015. [39] X. Han, J. Yu, J. Luo, and W. Sun, “Hyperspectral and multispectral
[17] R. Kawakami, J. Wright, Y. W. Tai, Y. Matsushita, M. Ben-Ezra, image fusion using cluster-based multi-branch bp neural networks,”
and K. Ikeuchi, “High-resolution hyperspectral imaging via matrix Remote Sens., vol. 11, no. 10, p. 1173, 2019.
factorization,” in IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2011, [40] L. Zhuang and J. M. Bioucas-Dias, “Fast hyperspectral image denoising
pp. 2329 –2336. and inpainting based on low-rank and sparse representations,” IEEE J.
[18] N. Akhtar, F. Shafait, and A. Mian, “Sparse spatio-spectral representa- Sel. Top. Appl. Earth Observ. Remote Sens., vol. 11, no. 3, pp. 730–742,
tion for hyperspectral image super-resolution,” in Euro. Conf. Comput. Mar. 2018.
Vis., Sep. 2014, pp. 63 –78. [41] Q. Wei, J. Bioucas-Dias, N. Dobigeon, J.-Y. Tourneret, M. Chen, and
[19] W. Dong, F. Fu, G. Shi, X. Cao, J. Wu, G. Li, and X. Li, “Hyperspectral S. Godsill, “Multi-band image fusion based on spectral unmixing,” IEEE
image super-resolution via non-negative structured sparse representa- Trans. Geosci. Remote Sens., vol. 54, no. 12, pp. 7236–7249, Dec. 2016.
tion,” IEEE Trans. Image Process., vol. 25, no. 5, pp. 2337–2352, May [42] K. Zhang, M. Wang, and S. Yang, “Multispectral and hyperspectral
2016. image fusion based on group spectral embedding and low-rank factoriza-
[20] X. Han, B. Shi, and Y. Zheng, “Self-similarity constrained sparse tion,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 3, pp. 1363–1371,
representation for hyperspectral image super-resolution,” IEEE Trans. Mar. 2017.
Image Process., vol. 27, no. 11, pp. 5625–5637, Nov. 2018. [43] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising
by sparse 3-D transform-domain collaborative filtering,” IEEE Trans.
[21] H. Irmak, G. B. Akar, and S. E. Yuksel, “A map-based approach for
Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007.
hyperspectral imagery super-resolution,” IEEE Trans. Image Process.,
[44] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, “Weighted
vol. 27, no. 6, pp. 2942–2951, Jun. 2018.
nuclear norm minimization and its applications to low level vision,” Int.
[22] N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled non-negative matrix J. Comput. Vis., vol. 121, no. 2, pp. 183–208, 2017.
factorization unmixing for hyperspectral and multispectral data fusion,” [45] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
IEEE Trans. Geosci. Remote Sens., vol. 50, no. 2, pp. 528–537, Feb. Gaussian denoiser: Residual learning of deep CNN for image denoising,”
2012. IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017.
[23] C. Lanaras, E. Baltsavias, and K. Schindler, “Hyperspectral super- [46] S. Venkatakrishnan, C. Bouman, and B. Wohlberg, “Plug-and-Play priors
resolution by coupled spectral unmixing,” in IEEE Int. Conf. Comput. for model based reconstruction,” in IEEE Global Conf. Signal Inf.
Vis., Dec. 2015, pp. 3586 –3594. Process., Dec. 2013, pp. 945–948.
[24] M. A. Bendoumi, M. He, and S. Mei, “Hyperspectral image resolution [47] Q. Wei, N. Dobigeon, J. Tourneret, J. Bioucas-Dias, and S. Godsill,
enhancement using high-resolution multispectral image based on spec- “R-fuse: Robust fast fusion of multiband images based on solving a
tral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10, pp. Sylvester equation,” IEEE Signal Process. Lett., vol. 23, no. 11, pp.
6574–6583, Oct. 2014. 1632–1636, Nov. 2016.
[25] M. Simoes, J. Bioucas-Dias, L. Almeida, and J. Chanussot, “A convex [48] Q. Wei, N. Dobigeon, and J.-Y. Tourneret, “Fast fusion of multi-band
formulation for hyperspectral image superresolution via subspace-based images based on solving a Sylvester equation,” IEEE Trans. Image
regularization,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 6, pp. Process., vol. 24, no. 11, pp. 4109–4121, Nov. 2015.
3373–3388, Jun. 2015. [49] M. Elad and M. Aharon, “Image denoising via sparse and redundant
[26] Q. Wei, N. Dobigeon, and J. Tourneret, “Fast fusion of multi-band representations over learned dictionaries,” IEEE Trans. Image Process.,
images based on solving a Sylvester equation,” IEEE Trans. Image vol. 15, no. 12, pp. 3736–3745, Dec. 2006.
Process., vol. 24, no. 11, pp. 4109–4121, Nov. 2015. [50] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm
[27] Q. Wei, J. Bioucas-Dias, N. Dobigeon, and J. Y. Tourneret, “Hyperspec- minimization with application to image denoising,” in IEEE Conf.
tral and multispectral image fusion based on a sparse representation,” Comput. Vis. Pattern Recog., Jun. 2014, pp. 2862–2869.
IEEE Trans. Geosci. Remote Sens., vol. 53, no. 7, pp. 3658–3668, Jul. [51] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
2015. Gaussian denoiser: Residual learning of deep CNN for image denoising,”
[28] J. M. Nascimento and J. M. Dias, “Vertex component analysis: A fast IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017.
algorithm to unmix hyperspectral data,” IEEE Trans. Geosci. Remote [52] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
Sens., vol. 43, no. 4, pp. 898–910, 2005. boltzmann machines,” in Int. Conf. Mach. Learn., Jun. 2010, pp. 807–
[29] R. C. Hardie, M. T. Eismann, and G. L. Wilson, “Map estimation for 814.
hyperspectral image resolution enhancement using an auxiliary sensor,” [53] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
IEEE Trans. Image Process., vol. 13, no. 9, pp. 1174–1184, 2004. network training by reducing internal covariate shift,” in Int. Conf. Mach.
[30] Y. Zhou, L. Feng, C. Hou, and S. Kung, “Hyperspectral and multispectral Learn., Jul. 2015, pp. 448–456.
image fusion based on local low rank and coupled spectral unmixing,” [54] F. Dell’Acqua, P. Gamba, A. Ferrari, J. Palmason, J. Benediktsson, and
IEEE Trans. Geosci. Remote Sens., vol. 55, no. 10, pp. 5997–6009, Oct. K. Arnason, “Exploiting spectral and spatial information in hyperspectral
2017. urban data with high resolution,” IEEE Geosci. Remote Sens. Lett.,
[31] M. Veganzones, M. Simões, G. Licciardi, N. Yokoya, J. Bioucas-Dias, vol. 1, no. 4, pp. 322–326, Oct. 2004.
and J. Chanussot, “Hyperspectral super-resolution of locally low rank [55] R. Green, M. Eastwood, C. Sarture, T. Chrien, M. Aronsson, B. Chippen-
images from complementary multisource data,” IEEE Trans. Geosci. dale, J. Faust, B. Pavri, C. Chovit, and M. Solis, “Imaging spectroscopy
Remote Sens., vol. 25, no. 1, pp. 274–288, Jan. 2016. and the airborne visible/infrared imaging spectrometer (AVIRIS),” Re-
mote Sens. Environ., vol. 65, no. 3, pp. 227–248, Sep. 1998.
[32] R. Dian, L. Fang, and S. Li, “Hyperspectral image super-resolution
[56] R. H. Yuhas, A. F. Goetz, and J. W. Boardman, “Discrimination among
via non-local sparse tensor factorization,” in IEEE Conf. Comput. Vis.
semi-arid landscape endmembers using the spectral angle mapper (SAM)
Pattern Recog., Jul. 2017, pp. 3862–3871.
algorithm,” JPL Airborne Geosci. Workshop, vol. 1, pp. 147–149, 1992.
[33] L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” [57] Z. Wang and A. Bovik, “A universal image quality index,” IEEE Signal
Psychometrika, vol. 31, no. 3, pp. 279–311, Sep. 1996. Process. Lett., vol. 9, no. 3, pp. 81–84, Mar. 2002.
[34] S. Li, R. Dian, L. Fang, and J. M. Bioucas-Dias, “Fusing hyperspectral [58] Z. Wang, A. Bovik, and H. Sheikh, “Image quality assessment: From
and multispectral images via coupled sparse tensor factorization,” IEEE error visibility to structural similarity,” IEEE Trans. Image Process.,
Trans. Image Process., vol. 27, no. 8, pp. 4118–4130, Aug. 2018. vol. 13, no. 4, pp. 600–612, Apr. 2004.
[35] C. Kanatsoulis, X. Fu, N. Sidiropoulos, and W. Ma, “Hyperspectral
super-resolution: A coupled tensor factorization approach,” IEEE Trans.
Signal Process., vol. 66, no. 24, pp. 6503–6517, Dec. 2018.
[36] Y. Chang, L. Yan, H. Fang, S. Zhong, and Z. Zhang, “Weighted
low-rank tensor recovery for hyperspectral image restoration,” CoRR,
vol. abs/1709.00192, 2017. [Online]. Available: http://arxiv.org/abs/
1709.00192
[37] R. Dian, S. Li, A. Guo, and L. Fang, “Deep hyperspectral image
sharpening,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 11,
pp. 5345–5355, 2018.
Renwei Dian (S’16) received the B.S. degree

from Wuhan University of Science and Technology,
Wuhan, China, in 2015. He is currently working
toward the Ph.D. degree in the Laboratory of Vision
and Image Processing, Hunan University, Changsha,
China.
From November 2017 to November 2018, he is
a visiting Ph.D. student with University of Lisbon,
Lisbon, Portugal, supported by the China Scholar-
ship Council. His research interests include hyper-
spectral image super-resolution, image fusion, tensor
decomposition, and deep learning. More information can be found in his
homepage https://sites.google.com/view/renweidian/.
Shutao Li (M’07-SM’15-F’19) received the B.S.,

M.S., and Ph.D. degrees from Hunan University,
Changsha, China, in 1995, 1997, and 2001, respec-
tively. In 2001, he joined the College of Electri-
cal and Information Engineering, Hunan University.
From May 2001 to October 2001, He was a Research
Associate with the Department of Computer Science,
Hong Kong University of Science and Technology.
From November 2002 to November 2003, he was
a Postdoctoral Fellow with the Royal Holloway
College, University of London. From April 2005 to
June 2005, he was a Visiting Professor with the Department of Computer
Science, Hong Kong University of Science and Technology. He is currently
a Full Professor with the College of Electrical and Information Engineering,
Hunan University. He has authored or co-authored over 200 refereed papers.
He gained two 2nd-Grade State Scientific and Technological Progress Awards
of China in 2004 and 2006. His current research interests include image
processing, pattern recognition, and artificial intelligence.
He is an Associate Editor of the IEEE TRANSACTIONS ON GEO-
SCIENCE AND REMOTE SENSING and the IEEE TRANSACTIONS ON
INSTRUMENTATION AND MEASUREMENT. He is a member of the
Editorial Board of the Information Fusion and the Sensing and Imaging.
Xudong Kang (S’13-M’15-SM’17) received the

B.Sc. degree from Northeast University, Shenyang,
China, in 2007, and the Ph.D. degree from Hunan
University, Changsha, China, in 2015. In 2015, he
joined the College of Electrical Engineering, Hunan
University. His research interests include hyperspec-
tral feature extraction, image classification, image
fusion, and anomaly detection.
Dr. Kang was the associate editor of IEEE Trans-

actions on Geoscience and Remote Sensing (2018-
2019), and IEEE Geoscience and Remote Sensing Letters (2018-now). He
received the National Nature Science Award of China (Second Class and
Rank as third), received the Second Prize in the Student Paper Competition in
International Geoscience and Remote Sensing Symposium (IGARSS) 2014.
He was also selected as the Best Reviewer for the IEEE Geoscience and
Remote Sensing Letters and IEEE Transactions on Geoscience and Remote
Sensing.
View publication stats

combined hsi n msi

Uploaded by

Copyright:

Available Formats

combined hsi n msi

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

combined hsi n msi

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Regularizing Hyperspectral and Multispectral Image Fusion by CNN Denoiser

Renwei Dian Xudong Kang

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Regularizing Hyperspectral and Multispectral Image

Here, U1 and V1 are semi-unitary, and diagonal matrix Σ1

where || · ||F denotes the Frobenius norm. In equation 8,

Fig. 2. The flowchart of the coefficients estimation.

function (9). As shown in Fig. 2, the coefficients are estimated

After obtaining Ā, the coefficients A is estimated as

V ∈ argmin L(A, V, G) = Ĥi = reshape(ci H(i, :) + bi , W, H), for i =, 1, 2, ..., L,

Pavia Univer sity Cupr ite Mine

In this section, experiments of HSI-MSI fusion are conduct- 44

B. Compared Methods C. Quantitative Metrics

Renwei Dian (S’16) received the B.S. degree

Shutao Li (M’07-SM’15-F’19) received the B.S.,

Xudong Kang (S’13-M’15-SM’17) received the

Dr. Kang was the associate editor of IEEE Trans-

View publication stats

You might also like