Paper 2
Paper 2
Paper 2
S & M 2744
Hyperspectral remote sensing images have high spectral resolution and provide rich
information on the types of features, but their high data dimensions and large data volume pose
challenges in data processing. In addition, it is difficult to obtain ground truths of hyperspectral
images (HSIs). Owing to the small number of training samples, the super-normative classification
of HSIs is particularly challenging and time-consuming. As deep learning techniques continue
to evolve, an increasing number of models have emerged for HSI classification. In this paper, we
propose a classification algorithm for HSIs called the residual generative adversarial network
(ResGAN), which automatically extracts spectral and spatial features for HSI classification.
When unlabeled HSI data are used to train ResGAN, the generator generates fake HSI samples
with a similar distribution to real data, and the discriminator contains high values suitable for
training a small number of samples with real labels. The main innovations of this method are
twofold. First, the generative adversarial network (GAN) is based on a dense residual network,
which fully learns the higher-level features of HSIs. Second, the loss function is modified using
the Wasserstein distance with a gradient penalty, and the discriminant model of the network is
changed to enhance the training stability. Using image data obtained from airborne visible–
infrared sensors of an imaging spectrometer, the performance of ResGAN was compared with
that of two HSI classification methods. The proposed network obtains excellent classification
results after only marking a small number of samples. From both subjective and objective
viewpoints, ResGAN is an excellent alternative to the standard GAN for HSI classification.
1. Introduction
analysis including feature extraction, multivariate data analysis, land cover classification,
vegetation refinement classification, and animal monitoring.
According to different designated reference standards, various HSI classification methods
can be divided into supervised, semi-supervised, and unsupervised methods.(1,2) Unsupervised
classification estimates the prior knowledge of hyperspectral data with no pre-existing
knowledge using only the differences among the target hyperspectral data. Hyperspectral data
are higher-dimensional and larger in size than ordinary optical image data, so they require
special calculations.(3–5) The lack of a priori information obscures the meaning of unsupervised
classification data as shown in Refs. 6–9. Supervised classification is based on prior knowledge
of the target and reference criteria that determine the categories of the non-sampled data.
Owing to its high precision, supervised classification is often preferred in HSI classification.
Traditional supervised classification methods, such as polynomial logistic regression (PLR) and
support vector machine (SVM), are widely used in HSI classification because they handle large
input spaces. However, the performance of these methods is usually degraded by a limited
number of training samples, and a large number of samples are required to achieve high
prediction accuracy. Therefore, classifying hyperspectral data by supervised classification is a
challenging task. Considering the limited availability of training samples for HSI classification,
researchers have proposed semi-supervised and active learning algorithms.
With the rapid development of pattern recognition, deep learning algorithms have been
widely used in HSI classification. Powerful deep learning models can effectively combine spatial
and spectral information, and avoid the complicated and manual feature engineering process by
automatically extracting an effective feature representation of the problem domain, namely, HSI
classification.(10) Graham extracted HSI features for image classification using an
autoencoder. (10) Lin et al. proposed a new method using convolutional neural networks (CNNs)
for HSI classification.(11) Similarly distributed CNN technologies using the A 3pviGrid
architecture were proposed and run by Amaldas and coworkers.(12–14) Goodfellow et al. were the
first to be inspired by zero-sum game theory to develop the original generative adversarial
network (GAN).(15) The GAN framework consists of two antagonistic networks: a generative
network (G) and a discriminative network (D). To improve the performance of the standard
WGAN, Gulrajani multiplied the gradient specification of the WGAN input by a penalty term.
Although the GAN has accurately classified HSIs, most GAN-based studies have focused on the
spatial and spectral domain effects on the GAN model without considering the training stability
or the ability of the network structure to learn complex hyperspectral features.(16) Zhan proposed
a novel semisupervised algorithm for the classification of hyperspectral data by training a
customized GAN for hyperspectral data. The GAN constructs an adversarial game between a
discriminator and a generator.(17) A large number of novel models based on a GAN have been
proposed for HSI classification.(18) A hyperspectral GAN (HSGAN) framework automatically
extracts the spectral features in HSI classification tasks. When training an HSGAN with
unlabeled hyperspectral data, the generator produces hyperspectral samples similar to authentic
ones.
The features of the discriminator are available for classifying hyperspectral data among a
small number of labeled samples using an HSGAN. A GAN is very similar to a CNN and has
Sensors and Materials, Vol. 33, No. 11 (2021) 4047
enabled rapid progress in computer vision, in which the development of distributed processing
method for neural networks has been a recent trend. Arjovsky identified a new GAN called a
Wasserstein GAN (WGAN), which efficiently minimizes the approximated Wasserstein
distance.(18) Radford presented a novel network architecture called a deep convolution GAN that
enhances the training stability and quality of the generated outcomes.(19)
In addition, the training time of hyperspectral classification models based on the GAN is
difficult to estimate. In the HSGAN, the gradient often disappears during the training process.
To alleviate this issue of HSGAN classification, we propose a dense residual GAN (ResGAN)
for HSI classification tasks. Our generative network (GN) includes a memory mechanism (MM)
that boosts the GN performance (and hence the ResGAN performance) using a dense residual
unit (DRU). The experimental results confirmed the improved test accuracy and visualization
results of this DR-GAN.(15–17)
The contributions of this paper are as follows. First, we briefly introduce previous GAN-
based methods and residual learning. Second, we describe the proposed ResGAN. Third, we
provide details of experiments in which we compare the performance of ResGAN with that of
two HSI classification methods. Finally, we draw conclusions through a discussion.
2. Related Works
Classification is a popular method for mining the rich information in HSIs. Image processing
by deep learning methods has thus far achieved good classification results, but deep learning
methods are prone to overfitting. Residual networks (ResNets) can alleviate overfitting. An
identity map added between the input and the output enables easy parameter optimization and
the extraction of feature information.
In this research, we introduce a ResNet model for HSI classification with optimization by
batch normalization reducing the dependence of the network on the initial parameters and
improving the generalization of the model. To reduce the effect on classification accuracy caused
by training samples, the model generates dummy samples. When tested on two different HSIs,
the method demonstrated its potentially broad applications in HSI classification. Figure 1 shows
the architecture of the HIS GAN network. Figure 2 shows the structure of the residual network
for noise spectral image classification.
Goodfellow and coworkers(15–17) were the first to be inspired by zero-sum game theory to
develop the original GAN. GAN-based HSI classification takes a 1D noise vector as the network
G input and generates a vector approximating the real spectral data by two full-join and two
convolution operations. The purpose is to deceive a diverse discriminant network D that is
trained to distinguish reconstructed and real images.
4048 Sensors and Materials, Vol. 33, No. 11 (2021)
In Fig. 3, we first describe the devised GN in ResGAN and then demonstrate the
discriminative network (DN). Finally, we introduce the modified loss function of ResGAN
based on the Wasserstein GAN with a gradient penalty (WGAN-GP), which is a GAN that uses
the Wasserstein loss formulation plus a gradient norm penalty to achieve Lipschitz continuity.
Sensors and Materials, Vol. 33, No. 11 (2021) 4049
Fig. 3. (Color online) Architecture of a DRU in a GN. The blue/purple and red lines symbolize the local and global
residual learning outputs of the DRUs, respectively.
where WFE,1 and WFE,2 represent nFE,1 convolution kernels of size c × kFE,1 × kFE,1 and nFE,2
convolution kernels of size nFE,1 × kFE,1 × kFE,2, respectively. c denotes the number of channels
of the input image IL, kFE,1 and kFE,2 are the spatial sizes of the convolution filter, and BFE,1 and
kFE,2 represent the biases. The ‘⁎’ operator performs a convolution operation, g(.) is the activation
function, and FE is the output of the feature extraction, which is input to the DRUs.
In this paper, the activation function g(.) is the parametric rectified linear unit (PReLU). The
PReLU form of g(.) is expressed as
x, if x > 0
g ( x) = , (2)
α t x, if x < 0
where at represents a learnable parameter and t denotes the iteration time. When the parameters
of the network are updated in reverse, ai is updated as
∂L
∆α t +1 =µ∆α t + ε , (3)
∂α t
where µ, ε, and L represent the momentum, learning rate, and loss function, respectively.
4050 Sensors and Materials, Vol. 33, No. 11 (2021)
We assumed DRUs, each with the architecture shown in Fig. 3. The operation of the kth DPR
(DRUp) is as follows.
=
(
Dk ,1 S k ,1 g (Wk ,1 ∗ Dk −1 + Bk ,1 ), Dk −1 )
= (
Dk ,2 S k ,2 g (Wk ,2 ∗ Dk −1 + Bk ,2 ), Dk ,1 , Dk −1 ) (4)
=
(
Dk ,3 S k ,3 g (Wk ,3 ∗ Dk − 2 + Bk ,3 ), Dk ,2 , Dk ,1 , Dk −1 )
=
Dk Dk ,3 + Dk −1
The kernels and biases of the three successive convolutional layers (indexed by 1, 2, and 3)
are represented by W k,1 to W k,3 and Bk,1 to Bk,3, respectively. Sk,1 to Sk,3 denote their corresponding
weighted-sum layers and Dk,1 to Dk,3 denote the corresponding outputs of their forward
convolutional layers. Dp denotes the output of DRUp.
The outputs of the preceding convolutional layers in each DRU (blue lines in Fig. 3) are
admitted into the posterior convolutional layers, forming the short-term memory. The preceding
outputs of the DRUs (red and purple lines in Fig. 3) are admitted into the later layers, similarly
forming the long-term memory. The former DRU and convolutional layer outputs are directly
connected to the later layers. This configuration not only reduces the number of feed-forward
features but also extracts the local dense features. Together, these connections realize the MM.
When the previous DRU and all the convolutional layers are admitted into the later layers, the
number of features must be reduced to ease the burden on the network. For this purpose, we
apply weighted-sum layers Sp,1 to Sp,2 that adaptively learn the specific weight of each memory
and decide the amounts of long-term and short-term memories to be saved. Sp,1 to Sp,3 in DRUp
are operated by a local decision function.
Rws = S RL * ( D1 , D2 , L, Dd , FE )
= Rws ,1 g (WRL,1 * Rws + BRL,1 ) , (5)
= R Rws ,1 + F1
where SRL denotes the weighted-sum layer, WRL,1 and BRL,1 represent the kernel and bias of the
convolutional layer, respectively, and D1 to Dd represent the successive outputs of the d DRUs.
Sensors and Materials, Vol. 33, No. 11 (2021) 4051
Rws, Rws,1, and R denote the outputs of the weighted-sum, convolutional, and element-wise sum
layers, respectively, in the residual learning part. In the residual learning part, LRL is performed
between a DRU and a weighted-sum layer, whereas GRL is implemented between the input
image IL and the element-wise sum layer (Fig. 3). The weighted-sum layer SRL extracts the
hierarchical features obtained from the previous DRUs through LRL and decides their
proportions in the subsequent features. SRL is operated by a global decision function that
compares Sp,1 with Sp,3 in DRUp. The features are further exploited by the convolutional layer
WRL,1, and the combined LRL and GRL improve the GN performance and reduce the overfitting
risk.
Our model differs from the DN HSGAN in two respects. Firstly, the last sigmoid layer is
replaced with a leaky ReLU layer.
The discriminative model in the HSGAN mainly attempts true and binary classifications,
whereas the DN in ResGAN approximates the Wasserstein distance between the classified
objects. Second, we remove the batch normalization (BN) layers from the DN and impose the
gradient penalty on each sample individually. The overall architecture of the DN is shown in Fig.
4. To improve the stability of the GAN training, WGAN-GP enforces a soft version of the
penalizing constraint on the gradient norm of random samples x − Px.
3. Experiments
The classification performance of the proposed ResGAN was compared with that of two
established HSI classification methods (HSGAN and residual neural network). The classification
performance of the methods was measured by three popular indexes: the overall accuracy (OA,
defining the probability that an individual is correctly classified), the average accuracy (AA,
obtained by summing the accuracies of all classes and dividing the result by the number of
classes), and the Kappa coefficient (Kappa, a reliability index of the ratings of different raters).
The graphics processing unit was a GPU NVIDIA 1080Ti graphics card. All experiments were
executed in TensorFlow and accelerated by operating six NVIDIA GTX1080 GPUs, each with
11 GB memory, in parallel. Owing to the size of the dataset, the computational complexity of the
network, and the need for parallelization, the setup was scaled using the aforementioned GPUs
and the overall setup was made scalable using the limited GPU memory available. The entire
training was completed in approximately four days and all the required experimental trainings
were set with a spatial window of 5 × 5, batch sizes of 128, and the learning rate set at 0.001 for
all epochs.
The experiments to evaluate the performance were conducted on two hyperspectral datasets:
the classical hyperspectral dataset of Indian Pines and a dataset of HSIs of Pavia University
(Italy).
The first dataset was gathered in northwestern Indiana over the Indian Pines test site by an
airborne visible/infrared imaging spectrometer sensor by the corresponding authors (Fig. 5).
After removing the water absorption bands, each image (size 145 × 145 pixels) consisted of 200
spectral bands. The spectral coverage was 0.4 to 2.5 μm and the spatial resolution was 20 m.
The second dataset comprised of a Pavia University image acquired by a reflective optics
system imaging spectrometer sensor (Fig. 6). After removing the junk bands, sub-images of 610
× 340 pixels with 103 spectral bands were retained for analysis. The spatial resolution of an
image was approximately 1.3 m. The ground truth information in the Indian Pines images was
differentiated into nine land use classes.
4. Results
The training set was obtained by randomly selecting 10% of the samples in each class of the
Indian Pines dataset, and the other samples were reserved for testing. Table 1 shows the
Fig. 5. (Color online) Indian Pines image and its related ground truth categorization information. (a) False-color
composite image (bands 28, 19, and 10). (b) Ground truth categorization map.
Sensors and Materials, Vol. 33, No. 11 (2021) 4053
Fig. 6. (Color online) Pavia University image and its related ground truth classification information. (a) False-color
composite image (bands 40, 20, and 10). (b) Ground truth classification map.
Table 1
Classification results of the Indian Pines dataset for each method.
Class C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
C11 C12 C13 C14 C15 C16 OA AA Kappa
100 97.97 98.8 97.05 98.34 99.45 100 100 90.00 96.91
HSGAN
99.06 98.31 98.05 100 98.19 95.7 99.34 97.99 98.46
Residual 100 98.88 98.43 95.78 96.89 99.32 100 100 100 96.19
network 99.35 98.65 99.02 99.6 98.45 75.27 99.26 97.24 98.26
100 99.37 100 100 99.17 99.59 100 100 100 99.49
ResGAN
99.55 98.65 100 100 99.74 96.77 99.79 99.52 99.51
classification accuracies and the corresponding standard deviations of the three algorithms,
namely, HSGAN, Residual, and ResGAN. The first 16 values of each row of Table 1 present the
results of each category, and the OA, AA, and Kappa values of all classes are the last three
values. The results confirm that the feature extraction ability of ResGAN is higher than that of
the residual network and HSGAN methods.
Figure 7 shows that the ResGAN model has improved classification accuracy on the classical
Indian Pines dataset compared with the other methods, particularly in dense boundary regions.
This shows that the model significantly improves the classification accuracy of small high-
dimensional samples.
The training set was obtained by randomly selecting 10% of the samples in each class of the
Pavia University dataset, and the other samples were reserved for testing. Table 2 shows the
numbers of training and test samples, as well as the classification accuracies and the related
standard deviations of the three algorithms, namely, HSGAN, Residual, and ResGAN. The first
4054 Sensors and Materials, Vol. 33, No. 11 (2021)
Fig. 7. (Color online) (a) Ground truth and classified visual maps of the Indian Pines dataset obtained by (b)
HSGAN, (c) residual network, and (d) ResGAN.
Table 2
Classification results of the Pavia University dataset for each method.
Class C1 C2 C3 C4 C5 C6 C7
C8 C9 OA AA Kappa
99.67 99.92 98 96.25 99.33 99.86 99.92
HSGAN
99.02 94.19 99.85 98.46 99.07
Residual 100 99.91 98.67 97.19 100 99.98 99.85
network 99.68 99.93 99.42 99.53 99.68
99.89 100 100 99.67 100 100 100
ResGAN
100 100 99.99 99.95 99.95
Fig. 8. (Color online) Ground truth (a) and visual map classifications of the Pavia University dataset obtained by
HSGAN (b), residual network (c), and ResGAN (d).
nine values in each row are the results of each category, and the OA, AA, and Kappa values of
all classes are the last three values. Again, ResGAN achieved the highest classification value in
most of the classes because it fuses the spatial and spectral features in the GN.
Figure 8 presents the visual classifications of the three methods on the Pavia University
dataset. It shows that ResGAN has better applicability than the other methods for this urban
landscape hyperspectral dataset with relatively large spatial coverage and rich feature types.
Sensors and Materials, Vol. 33, No. 11 (2021) 4055
5. Conclusions
Acknowledgments
References
1 A. Krizhevsky, I. Sutskever, and G. E. Hinton: Adv. Neural Inf. Process. Syst. 25 (2012) 1097. https://doi.
org/10.1061/(ASCE)GT.1943-5606.0001284
2 H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers: IEEE Trans.
Med. Imaging 35 (2016) 1285.
3 M. E. Paoletti, J. M. Haut, J. Plaza, and A. Plaza: ISPRS J. Photogramm. Remote Sens. 145 (2018) 120.
4 M. E. Paoletti, J. M. Haut, J. Plaza, and A. Plaza: Remote Sens. 10 (2018) 1454.
5 M. E. Paoletti, J. M. Haut, J. Plaza, A. Plaza, and J. Vigo-Aguiar: Proc. 17th Int. Conf. Comput. Math. Methods
Sci. Eng. (2017) 1625.
6 N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. DeVito, W. S. Moses, S. Verdoolaege, A. Adams, and A.
Cohen: arXiv preprint arXiv:1802.04730 (2018).
7 M. B. A. Miah and M. A. Yousuf: 2015 Int. Conf. Electrical Engineering and Information Communication
Technology (ICEEICT) (2015) 1.
8 ImageNet: http://www.imagenet.cn
9 L. Hertel, E. Barth, T. Käster, and T. Martinetz: 2015 Int. Joint Conf. Neural Networks (IJCNN) (2015) 1.
10 B. Graham: arXiv preprint arXiv:1412.6071 (2014).
11 M. Lin, Q. Chen, and S. Yan: arXiv preprint arXiv:1312.4400 (2013).
12 C. Amaldas, A. Shankaranarayanan, and K. Gemba: Int. J. GRID 4 (2013) 09.
13 C. Amaldas, A. Shankaranarayanan, and K. Gemba: Int. J. GRID 4 (2013) 50.
14 A. Shankaranarayanan, C. Amaldas, and R. Pears: 2008 Int. Conf. Biocomputation, Bioinformatics, and
Biomedical Technologies (2008) 35.
15 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio:
Commun. ACM 63 (2020) 139.
16 I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville: arXiv preprint arXiv:1704.00028 (2017).
17 Y. Zhan, D. Hu, Y. Wang, and X. Yu: IEEE Geosci. Remote Sens. Lett. 15 (2017) 212.
18 M. Arjovsky, S. Chintala, and L. Bottou: Int. Conf Machine Learning (2017) 214.
19 A. Radford, L. Metz, and S. Chintala: arXiv preprint arXiv:1511.06434 (2015).
Ri-hui Tan graduated from Guangxi University with bachelor’s and master’s
degrees in automation from the School of Electrical Engineering. After
graduation, she worked in the Tourism Data College of Guilin Tourism
University as a lecturer. She is mainly engaged in big data and artificial
intelligence involving the Guangxi tourism economy and in works related to
tourism data.