Feature Fusion Based On Convolutional Neural Netwo PDF

ITM Web of Conferences 12, 05001 (2017) DOI: 10.
1051/ itmconf/20171205001
ITA 2017
Feature Fusion Based on Convolutional Neural Network for SAR ATR
Shi-Qi CHEN, Rong-Hui ZHAN, Jie-Min HU, Jun ZHANG

Automatic Target Recognition Laboratory,National University of Defense Technology,Changsha, China
[email protected]. [email protected],[email protected],[email protected]
Abstract—Recent breakthroughs in algorithms related to deep convolutional neural networks (DCNN) have stimulated the
development of various of signal processing approaches, where the specific application of Automatic Target Recognition
(ATR) using Synthetic Aperture Radar (SAR) data has spurred widely attention. Inspired by the more efficient distributed
training such as inception architecture and residual network, a new feature fusion structure which jointly exploits all the
merits of each version is proposed to reduce the data dimensions and the complexity of computation. The detailed
procedure presented in this paper consists of the fused features, which make the representation of SAR images more
distinguishable after the extraction of a set of features from DCNN, followed by a trainable classifier. In particular, the
obtained results on the 10-class benchmark data set demonstrate that the presented architecture can achieve remarkable
classification performance to the current state-of-the-art methods.
handcrafted features with human intervention, the training
process in CNN automatically find appropriate features in
1 Introduction search space, which are sent into trainable classifiers in the
As a kind of active microwave imaging radar with all-day, second stage, and thus to avoid the complexity of pre-
all-weather and long-range detection capabilities, synthetic processing and feature selection.
aperture radar (SAR) has occupied a leading role in areas A variety of works have been done to achieve better
such as early warning, surveillance and guidance, of which performance in the past decades, however, it still remains a
the most extensive application is in automatic target challenging task since modern advanced techniques require
recognition (ATR). tens of thousands of examples to train adequately. Robert
With the emergence of well-performed classifiers such Wang [9] has augmented dataset to test the the adaptability
as support vector machine (SVM) [1], k-nearest neighbor on subsets under different displacement and rotation settings.
(KNN) [2] and AdaBoost [3], machine learning technology Furthermore, the training based on very deep networks still
has attracted much attention in SAR ATR studies. However, faces problems for the reason that the stacking of non-linear
most of the work in machine learning approaches focus on transformations in typical feed-forward network generally
designing a set of feature extractors. In SAR ATR field, the result in poor propagation of activations as well as vanishing
existing technique lacks the ability of extracting effective gradients. Hence it remains necessary to modify the
features and fusing them. Recently, a newly-developing architecture of deep feed-forward networks.
learning method called convolutional neural networks (CNN) Owing to the background speckles all over the SAR
have been successfully applied to many large scale visual images and the irregular distributions of strong scattering
recognition tasks such as image classification [4]-[6], object centers, SAR ATR is of great complexity especially when
detection [7][8]. Motivated by the model of mammal’s the networks get deeper. To enable the feasibility of training
visual system, CNNs consist of many hidden layers with the deeper network with limited data, we adopt CNN
convolution operations which extract the features in the architectures which fuse the features extracted from
image and achieve state-of-the-arts results on the visual different branches and these structures tends to perform well
image data set such as ImageNet [4]. In this way, we in maximally exploiting the input data and improving
consider employing CNN in SAR ATR field by means of classification accuracy.
designing reasonable network structure and optimizing The remainder of this paper is organized as follows.
training algorithms. Section II discusses the basic components of the CNN
Generally, the architecture of CNNs can be interpreted network as well as the training method. Section III gives an
as a two-stage procedure in image classification tasks. introduction into the two feature fusion categories and how
Unlike the previous methods which heavily rely on they are constructed in the proposed networks. Experimental
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 12, 05001 (2017) DOI: 10.1051/ itmconf/20171205001
ITA 2017
results conducted from SAR imagery are analyzed in using averaging or choosing maximal value neuron in the
Section IV to identify the desirable performance of the group, pooling layers largely reduce the amount of
novel network architectures. Finally, we summarize this computation through lessening the parameters. These layers
paper. also make CNN better at translation, shift and distortion
invariance. We assume that each feature map in a
2 Structure And Training of Cnns convolution layer corresponds to a single map in the
Among various of deep learning methods, CNN is an associated pooling layer. For example, the max pooling
effective model which considers the dependencies between operation [13] is defined as
pixels in an image. The CNN consists of consecutive layers ˄l 1˅ (l )
of neurons with learnable weights and bias, where the Om ( x, y ) max Om ( x s p, y s q ) (3)
p , q 0,... K 1
concept of perceptive field is applied. In this scenario, we where K is the pooling size, and s is the stride which
will give a detailed description of the basic operation indicates the internals between adjacent pooling windows.
modules in CNN.
2.1Convolution Layer 2.4Error Function

Each CNN layer can be visualized as a 1-dimensional (1D) As a training method for neural networks, the back
stack of 2-dimensional (2D) filters. Meanwhile, a subset of a propagation (BP) [14] algorithm in CNN uses the categorical
layer’s neuron works as receptive fields for the layers next to cross entropy as the object function, which aims to measure
it. the consistency between the prediction of network outputs
The fundamental operation underpinning a CNN is the and the truth labels.
2D convolution. In the convolution layer, we define the Softmax layer [15] is a commonly used multi-
input feature maps of the previous layer as O˄ml ˄ 1˅
m 1,...M ) classification layer in which the output probabilities over
which are connected to the output feature maps each class is computed at the final stage of the CNN
architecture.
On( l ) ( n 1,...N ) . Om( l 1) ( x, y ) and On(l) ( x, y ) represent the unit
Given a training set {( x ( i ) , t ( i ) ); i 1,..., N , t ( i ) 1,..., T } ,
of the mth input feature map and the nth output feature map
at position ( x , y ) respectively. Each unit in the convolution where x ( i ) is the ith input image, N is the number of
layer is computed as: examples, and t ( i ) refers to the target label among T classes.
The prediction y (ji ) of jth class for ith input is passed
M F 1
On( l ) ( x, y ) V (¦ ¦k (l )
nm
( p, q )Om( l 1) ( x p, y q ) bn( l ) ) (1) through softmax function to yield a normalized probability
m 1 p,q 0 distribution over classes. The softmax layer operation takes
the form as [15]:
(l )
where the convolution kernel k nm ( p, q ) denotes the trainable (i)
yj
filters, while other terms bn( l ) and V represent the bias of the (i ) e
p j
(4)
¦
(i)
T yj
nth output feature map and the nonlinear activation function. j 1
e
2.2Activation Function We use the softmax output layer in conjunction with the
To form a highly nonlinear mapping relationship between the categorical cross-entropy error function. With that as the
input image and the output label, a proper activation function background, the error function E is defined as:
is added at each convolution layer [10]. Since a hyperbolic
tangent function or a sigmoid function [11] may get trapped N T
1
into saturating nonlinearities during the gradient propagation E [¦¦ 1{t ( i ) j}ln p (ji ) ] (5)
process, a piecewise linear function which does not suffer N i 1 j 1
from vanishing gradients is introduced to make the

computation more efficient. The rectified linear unit (ReLU)
activation function [12] is defined as: 2 Feature Fusion in Cnn
The combination of a variety of convolution layers and
f ( x) max(0, x) (2)
pooling layers plays an influential role in learning robust
feature representations. Unlike natural image datasets, SAR
images are more complex due to its complicated imaging
2.3Pooling Layer mechanism. Even when the images are obtained from
different azimuth angle, the consequent shapes of targets
Pooling layers are introduced in CNN to reduce the from the same type vary a lot. Furthermore, the speckling
resolution of images between successive layers. By either noise in SAR images makes it tougher to interpret each target.
2
ITA 2017
Although CNN works as a good feature extractor, how to 3.2Fusion under Summation Pattern
make use of multi-dimensional features and fuse them
together still deserves exploration. To increase the diversity Deep networks stacked by several layers are usually tracked
of extracted feature and understand the interactions between in degradation problem. Enlightened by the deep residual
them, we formulate two patterns of feature fusion. Since the learning framework [16], we design the second pattern of
targets in SAR images may not be square shaped, use of feature fusion. Formally, the expected underlying mapping
convolution kernels sharing the same size are restricted to also known as unreferenced mapping is represented as F(x).
learn more adequate features, hence the inserted convolution H(x) denotes the residual mapping consisting of two
kernels are modified as that with rectangular shape. consecutive convolution layers. When each convolution only
computes corrective terms H ( x ) F ( x ) x rather than the
3.1Feature Fusion under Concatenation Pattern entire approximation F(x), the thought of shortcut connection
which means skipping several layers is realized. By adopting
Usually, the original convolution layer contains several the summation mechanism between layers, the information
square kernels, in this pattern we replace them with two from previous layer flow more smoothly to the next layer
convolution branches arranged by concatenation. This mode without attenuation and the layers can learn the difference
of feature fusion on convolution level is referred to as “conv- from the input feature map. Additionally, considering the
fusion” module. The concrete structure of its elements is ability of asymmetric kernels in extracting features with
demonstrated in Figure 1, where a represents the equivalent various scales and structures, we propose a new conception
kernel size in standard CNN structure while n denotes the called asymmetric shortcut connection block (ASCB) as
adjustable kernel size. Here, the convolution marked with S shown in Figure 2.
are same-padded while V signifies that are valid-padded.
x
Conv 1×n (S)

relu
Conv n×1 (S)
+
relu
Maxpooling 2h2
Figure 2. Asymmetric shortcut connection block
This means the untransformed input of previous layer

can be propagated to the output and the feature maps
between adjacent layer can be joined, hence making the
Figure 1. Conv-fusion module deep network easier to be optimized and more efficient to be
trained. To ensure that transformed output feature maps
Different from general convolution, S indicates that the match the input size, we use two consecutive convolution
size of the output grid is identical with that of input feature operations under “same” mode.
map. For the purpose of getting a same output size from two Since the convolution kernel size differs in each layer
branches, the convolution operation under same mode is with the deepening of networks, multi-scale features can be
utilized to ensure the input and output dimension match. potentially extracted under the multi-level fusions, which
That is to say, when the feature map with size m × m is intensify the links between features from adjacent layers and
transferred into the module, the size of output feature map is make them more adaptable to train deep networks.
(ma+1) × (ma+1).
In this module, the 1 h 1 convolutions placed before 3 Experiment Results
convolution operation of other sizes function as dimension To design the appropriate number of feature maps and filter
reduction module [5], which allow for expanding the depth sizes used in convolutional layers as well as down sampling
and width of CNN without increasing the computational rates and numbers of stacking layers, several CNN
complexity. architectures were tested. As the operations of forward and
backward propagation are highly parallelizable, we utilize
GPU processing based on deep CNN implementation to
accelerate the testing process.
3
ITA 2017
4.1Data Description As for weight initialization, the weights of all the layers
are sent into an initializer [18] with uniform distribution
The CNN structure in this paper is applied to address the
problem of SAR ATR for MSTAR dataset. The MSTAR ( 6 ( fan _ in fan _ out ) , 6 ( fan _ in fan _ out ) ) ,
benchmark data act as a standard data set to test and evaluate where fan _ in and fan _ out represents the number of
the performance of recognition algorithms. In our experiment, input units and output units respectively.
images for training are captured at 17 degree depression
angle and that for testing are captured at 15 degree. The
proposed algorithm is evaluated on the ten-target 4.3Experiments on Different Network
classification problem, and the corresponding number of Configurations
images for training and testing are listed in Table I.
Here, we present the general layout of our baseline
convolutional neural network (BCNN) and then describe the
4.2Training Details details of components used in BCNN. The schematic view of
BCNN is depicted in Figure 3. To reduce the feature
As has been presented in previous work, the implementation dimension, each convolution layer is followed by a pooling
details play a decisive role in recognition layer with the 2 u 2 pooling size and a stride of 2 pixels.
TABLE I. NUMBER OF TRAINING AND TESTING IMAGES FOR THE To avoid a fair amount of redundancy and spilling over
EXPERIMENTS of local information brought by relatively large receptive
fields in each convolution layer, we choose convolution
Target Types No. Testing No. Training
kernel size smaller than 7. In the structure, the fully
BMP2(9563) 195 233
connected layer is replaced by the global average pooling
BMP2(9566) 196 (GAP) layer since the correspondence between feature maps
BMP2(C21) 196 and categories is strengthened and no parameters need
T72(132) 196 232 optimizing [19].
T72(812) 195
T72(S7) 191
BTR70 196 233
BTR60 195 256
BRDM2 274 298
ZSU 274 299
T62 273 299
ZIL 274 299
2S1 274 299
D7 274 299
Total 3203 2747
performance. We aim to derive a set of best CNN
structures both concerning low computation cost and high Figure 3. Overall architecture of BCNN (conv. (kernel depth) @
accuracy constraints towards the application of limited (kernel size))
labelled SAR data. Except for some hyper parameters
determined in CNNs, other details such as weight For MSTAR dataset, the recognition accuracy on BCNN
initialization, learning rate also count. is obtained by the above hyper parameter setting. Next, the
To improve the training speed as well as prevent the feature-fusion patterns are introduced into the BCNN and all
network from becoming trapped in local minima, a set of these CNN-based methods share the same parameter setting
fine-tuning methods have been applied to various of except for different framework. The relevant composition of
recognition tasks. Based on the gradient descent method, transformed conv-fusion CNN is listed in Table II.
Adam [17] algorithm can dynamically adjust the updating TABLE II. COMPOSITION OF CONV-FUSION CNN
amount of each parameter according to the moment
estimation and thus achieve the efficiency and stability of Left
Right
CNN. In view of this, we chose the Adam technique as a Layer Output Filter Branch
Branch
substitute for normal mini-batch SGD using the given name/Type size size Filter
Filter size
size
default value set in K 0.001ˈE1 0.9ˈE 2 0.999 [17].
Input image 64 64 1 / / /
Additionally, several subsets of training data are chosen 1 3/
stochastically from the whole data set as mini-batch and all Conv-fuse1 60 60 16 / 5 5
3 1
the models are trained under a 50-element batch size. Maxpool1 / /
30 30 16 2 2
4
ITA 2017
1 5/ The whole network is called Module-residual CNN. Here

Conv-fuse2
we chose 1 u 3 / 3 u 1, 1 u 5 / 5 u 1, 1 u 7 / 7 u 1 kernel size in
24 24 32 7 7
5 1
Maxpool2 12 12 32 2 2 / / an ASCB and obtain an improvement in classification
1 7/ accuracy. Figure 4 is a flow chart of the proposed network.
Conv-fuse3 8 8 64 5 5
7 1
Maxpool3 4 4 64 2 2 / /
GAP 64 2u2
We have to mention that in the conv-fusion module, the
number of feature channels from two branches are defaulted
as the same. Hence, the feature map number of the merged
convolution layer becomes twice.
As for the second pattern of feature fusion, an
asymmetric shortcut connection block is inserted into the
network between each convolution layer and pooling layer,
with different asymmetric convolution set in each block. Figure 4. An Illustration of The Proposed Network
TABLE III. THE CONFUSION MATRIX UNDER CONCATENATION PATTERN
Recognition Result
Target Types BMP2 T72 Accuracy (%)
BTR70 BTR60 BRDM2 ZSU T62 ZIL 2S1 D7
BMP2(9563) 195 0 0 0 0 0 0 0 0 0 100.00
BMP2(9566) 195 0 0 0 1 0 0 0 0 0 99.49
BMP2(C21) 189 4 0 0 3 0 0 0 0 0 96.43
T72(132) 1 195 0 0 0 0 0 0 0 0 99.49
T72(812) 5 183 0 1 0 0 6 0 0 0 93.85
T72(S7) 8 177 0 1 0 0 4 0 1 0 92.67
BTR70 0 0 196 0 0 0 0 0 0 0 100.00
BTR60 0 0 7 188 0 0 0 0 0 0 96.41
BRDM2 1 0 0 0 272 0 0 1 0 0 99.27
ZSU 0 0 0 0 0 274 0 0 0 0 100.00
T62 0 1 0 0 0 1 271 0 0 0 99.27
ZIL 0 0 0 0 0 0 1 273 0 0 99.64
2S1 1 0 3 0 1 0 0 0 269 0 98.18
D7 0 0 0 0 0 0 0 0 0 274 100.00
Global Accuracy (%) 98.38
TABLE IV. THE CONFUSION MATRIX UNDER SUMMATION PATTERN
Recognition Result
Target Types Accuracy (%)
BMP2 T72 BTR70 BTR60 BRDM2 ZSU T62 ZIL 2S1 D7
BMP2(9563) 195 0 0 0 0 0 0 0 0 0 100.00
BMP2(9566) 191 3 0 0 2 0 0 0 0 0 97.45
BMP2(C21) 189 2 2 1 1 0 0 0 1 0 96.43
T72(132) 0 196 0 0 0 0 0 0 0 0 100.00
T72(812) 2 190 0 0 0 0 3 0 0 0 97.44
T72(S7) 6 183 0 0 0 0 2 0 0 0 95.81
BTR70 0 0 194 1 1 0 0 0 0 0 98.98
BTR60 0 0 4 191 0 0 0 0 0 0 97.95
BRDM2 0 0 0 0 274 0 0 0 0 0 100.00
ZSU 0 0 0 0 0 274 0 0 0 0 100.00
T62 0 3 0 0 0 0 270 0 0 0 98.90
ZIL 0 0 0 0 0 1 0 273 0 0 99.64
2S1 0 2 0 0 1 0 3 0 268 0 97.81
D7 0 0 0 0 0 0 0 0 0 274 100.00
5
ITA 2017
TABLE V. THE CONFUSION MATRIX OF UNDER DECISION-LEVEL FUSION
Recognition Result Accuracy

Target Types
BMP2 T72 BTR70 BTR60 BRDM2 ZSU T62 ZIL 2S1 D7 (%)
BMP2(9563) 195 0 0 0 0 0 0 0 0 0 100.00
BMP2(9566) 195 0 0 0 0 1 0 0 0 0 99.49
BMP2(C21) 195 1 0 0 0 0 0 0 1 0 99.49
T72(132) 0 196 0 0 0 0 0 0 0 0 100.00
T72(812) 1 190 0 1 0 0 4 0 0 0 96.94
T72(S7) 4 187 0 0 0 0 0 0 0 0 97.90
BTR70 0 0 196 0 0 0 0 0 0 0 100.00
BTR60 0 0 3 192 0 0 0 0 0 0 98.46
BRDM2 0 0 0 0 274 0 0 1 0 0 100.00
ZSU 0 0 0 0 0 274 0 0 0 0 100.00
T62 0 0 0 0 0 0 273 0 0 0 100.00
ZIL 0 0 0 0 0 0 0 274 0 0 100.00
2S1 0 0 1 0 0 0 0 0 273 0 99.64
D7 0 0 0 0 0 0 0 0 0 274 100.00
feature fusion. For the targets without variants, the totally
To visually understand the intermediate state in the precise accuracy obtained in BMP2, T72, BRDM2, ZSU
ASCB block and verify the reasonability of feature fusion, and D7 illustrates the effectiveness of our algorithms.
we display a set of hierarchical features learned from CNN. Since each CNN architecture extracts global and local
For instance, a BMP2 image is randomly chosen from the features transmitted into classifier, we integrate three
training dataset then the forward propagation process in network architectures and make decision-level fusion of
each stage is visualized. In Figure 5, several output grids of which we choose the largest posterior probability, a
size 8 u 8 in the second ASCB are displayed. decision-level recognition rate of 99.42% can be obtained.
4.4Comparison with Previous Methods

To evaluate the feature extraction and fusion capability of
the proposed CNN, we compare it with some widely cited
methods including support vector machine (SVM) [20],
Adaptive Boosting technique [21], polynomial kernel PCA
(KPCA) [22], sparse representation based on monogenic
signal (MSRC) [23] and Iterative graph thickening model of
image representations (IGT) [24]. The recognition
accuracies of these algorithms are shown in Table VI.
Experiments conducted by Jun Ding [25] also suggest a
typical CNN architecture but comparatively lower
classification rate with data augmentation for this method.
Figure 5. Visualization of 8 u 8 feature map in ASCB (n=7) TABLE VI. CLASSIFICATION RESULTS OF OTHER METHODS
We can see from the figure 5 that the loss of visual Method Accuracy (%)
information is suppressed after the last output map is fed SVM [20] 86.73
through the block. By exploiting the complementarity of Adaptive Boosting [21] 92.70
each map aggregated to the next stage, our proposed CNN KPCA [22] 92.67
structure can learn more specific and abstract features. MSRC [23] 93.66
The recognition results for different CNN structures are IGT [24] 95.00
shown in Table III, IV, V, where each row denotes the real CNN with data augmentation
target type and each column denotes the predictive label. 93.16
[25]
As we can conclude from the table, the overall accuracy BCNN 97.39
of the two proposed architectures can reach 98.38% and
Conv-fusion CNN 98.38
98.72% respectively. Whilst variations exist in testing set,
the lowest accuracy is 95.81% for T72 in the second pattern Module-residual CNN 98.72
6
ITA 2017
Decision fusion CNN 99.42 [7] R. Girshick, J. Donahue, T. Darrell, and J. Malik 2014.
It is desirable that the feature fusion CNNs show the “Rich feature hierarchies for accurate object detection
ability of classifying ten-class targets regardless of the and semantic segmentation,” in Proc. IEEE Computer
existence of variants. Meanwhile, neither increasing the Vision Pattern Recognition. , pp. 580–587.
width of network nor the complexity of architectures, the [8] X. Chen, S. Xiang, C. Liu, and C. Pan 2014. “Vehicle
excellent combination of high-level feature representations detection in satellite images by hybrid deep
and learning of potential invariant feature can achieve convolutional neural networks,” IEEE Geoscience
particularly good performance even when the labelled Remote Sense Letter, vol. 11, no. 10, pp. 1797–1801.
training data is limited. [9] Du, K., Deng, Y., Wang, R., Zhao, T., & Li, N 2016.
Sar atr based on displacement- and rotation-insensitive
4 Conclusion cnn. Remote Sensing Letters, 7(9), 895-904.
[10] Chen, S., Wang, H., Xu, F., & Jin, Y. Q. 2016. Target
Understanding the multi-scale and hierarchical features
classification using the deep convolutional networks
learned by CNN and then dig thoroughly into the
for sar images. IEEE Transactions on Geoscience &
discriminative features help us finish target recognition task.
Remote Sensing, 54(8), 1-12.
In this paper, we first constructed novel CNN architectures
[11] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner 1998.
which contain two independent modules called “conv-
“Gradient-based learning applied to document
fusion” module and asymmetric shortcut connection block,
recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–
then fined tune the hyper parameters in the deep CNN in the
2324.
second stage and finally applied them to address the problem
[12] V. Nair and G. E. Hinton 2010. “Rectified linear units
of recognition using SAR images. Experimental results
improve restricted boltzmann machines,” in ICML.
indicate that the proposed network can gain superior
[13] Y. LeCun, K. Kavukcuoglu, and C. Farabet 2010.
performance compared with other state-of-the-art methods
“Convolutional networks and applications in vision,” in
by extracting forceful feature representations. Furthermore,
Proc. IEEE International Symposium Circuits System,
the CNN approach presented in this paper can be revised or
pp. 253–256.
extended to other practical applications and thus provide
[14] Y. L. Cun, B. Boser, J. S. Denker, R. E. Howard, W.
insightful points in the recognition tasks aimed at small
Habbard, L. D. Jackel, and D. Henderson 1990.
dataset.
“Handwritten digit recognition with a backpropagation
Acknowledgment network,” in Advances in Neural Information
Processing Systems, pp. 396–404.
The research work is supported by the National Natural [15] C. M. Bishop 2006 Pattern Recognition and Machine
Science Foundation of China under grant No.61471370. Learning. New York, NY, USA: Springer-Verlag.
[16] He, K., Zhang, X., Ren, S., & Sun, J. 2015. Deep
References residual learning for image recognition. 770-778. .
[1] Q. Zhao and J. C. Principe 2001. “Support vector [17] D. P. Kingma and J. Ba 2014. “Adam: A method for
machines for sar automatic target recognition,” IEEE stochastic optimization,” Computer Science.
Transactions on Aerospace & Electronic Systems, vol. [18] K. He, X. Zhang, S. Ren, and J. Sun 2015. “Delving
37, no. 2, pp. 643–654. deep into rectifiers: Surpassing human-level
[2] Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. 2003. performance on ImageNet classification,” in Proc. Int.
Knn model-based approach in classification. Lecture Conf. Comput. Vis., pp. 1026–1034.
Notes in Computer Science, 2888, 986-996. [19] Lin, M., Chen, Q., & Yan, S. 2013. Network in
[3] Y. Sun, Z. Liu, S. Todorovic, and J. Li 2007. “Adaptive network. Computer Science.
boosting for sar automatic target recognition,” IEEE [20] Bengio Y, Nicolas Boulanger-Lewandowski, Razvan
Transactions on Aerospace & Electronic Systems, vol. Pascanu 2013. “Advances in optimizing recurrent
43, no. 1, pp. 112–125. networks,” IEEE International Conference on
[4] A. Krizhevsky, I. Sutskever, and G. Hinton 2012. Acoustics, Speech and Signal Processing, 8624-8628.
“ImageNet classification with deep convolutional [21] Sun, Y., Liu, Z., Todorovic, S., and Li, J. 2007.
neural networks,” in Proc. Adv. Neural Inf. Process. “Adaptive boosting for sar automatic target
Syst., pp. 1106–1114. recognition,” Aerospace and Electronic Systems, IEEE
[5] C. Szegedy et al. 2015. “Going deeper with Transactions on 43, 112–125.
convolutions,” in Proc. IEEE Computer Vision Pattern [22] Mishra, A. K., & Motaung, T. 2015. Application of
Recognition, Boston, MA, USA, Jun. 8–10, pp. 1–9. linear and nonlinear PCA to SAR ATR.
[6] K. Simonyan and A. Zisserman 2015. “Very deep Radioelektronika.IEEE.pp.349-354..
convolutional networks for large-scale image [23] D. Ganggang, W. Na, and K. Gangyao 2014. “Sparse
recognition,” presented at the Int. Conf. Learning representation of monogenic signal: With application to
Representations. [Online]. Available: target recognition in sar images,” vol. 21, no. 8, pp.
http://arxiv.org/abs/1409.1556 952–956.
7
ITA 2017
[24] S. Umamahesh, M. Vishal, and R. Raghu, G. 2014.

“Sar automatic target recognition using discriminative
graphical models,” vol. 50, pp. 591–606.
[25] D. Jun, C. Bo, L. Hongwei, and H. Mengyuan 2016.
“Convolutional neural network with data augmentation
for sar target recognition,” in IEEE Geoscience and
remote sensing letters, vol. 13, no. 3, pp. 364–368.

Feature Fusion Based On Convolutional Neural Netwo PDF

Uploaded by

Copyright:

Available Formats

Feature Fusion Based On Convolutional Neural Netwo PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feature Fusion Based On Convolutional Neural Netwo PDF

Uploaded by

Copyright:

Available Formats

ITM Web of Conferences 12, 05001 (2017) DOI: 10.

Feature Fusion Based on Convolutional Neural Network for SAR ATR

Shi-Qi CHEN, Rong-Hui ZHAN, Jie-Min HU, Jun ZHANG

2.1Convolution Layer 2.4Error Function

from vanishing gradients is introduced to make the

Conv 1×n (S)

Figure 2. Asymmetric shortcut connection block

This means the untransformed input of previous layer

1 5/ The whole network is called Module-residual CNN. Here

TABLE III. THE CONFUSION MATRIX UNDER CONCATENATION PATTERN

TABLE IV. THE CONFUSION MATRIX UNDER SUMMATION PATTERN

TABLE V. THE CONFUSION MATRIX OF UNDER DECISION-LEVEL FUSION

Recognition Result Accuracy

4.4Comparison with Previous Methods

[24] S. Umamahesh, M. Vishal, and R. Raghu, G. 2014.

You might also like