Ymer 230109
Ymer 230109
Ymer 230109
com
Abstract
The article focuses on assessing object detection performance in satellite images using
the YOLOv4 network. As satellite image quantity and quality increase, intelligent
observation methods become crucial. Deep learning, particularly Convolutional Neural
Networks (CNNs), has excelled in computer vision, prompting exploration in remote
sensing imagery. The study evaluates YOLOv4 effectiveness in detecting objects,
conducting tests on DIOR and HRRSD datasets. YOLOv4 outperforms other CNN models
in detection accuracy, showcasing its potential for efficient satellite image object
detection. The evaluation involves data preprocessing, model training, and comprehensive
analysis of results from both datasets. YOLOv4's strengths lie in diverse scenario handling
and rapid learning, as identified through literature analysis. The study demonstrates
YOLOv4's applicability and superiority in satellite image object detection, offering more
accurate and efficient methods for remote sensing applications. The insights gained guide
future studies and applications in remote sensing and computer vision, contributing to
improved observation techniques in satellite imagery.
1. Introduction
The human visual system possesses remarkable speed and accuracy, enabling us to
effortlessly perform complex visual tasks by unconsciously recognizing objects, their
spatial relationships, and interactions. However, machines, despite recent advancements
in hardware and machine learning, still require extensive time and training examples to
achieve comparable object identification capabilities. The field of computer vision has
witnessed significant progress, making it more accessible and intuitive than ever before.
This paper addresses the critical task of object detection in satellite images. To tackle
this challenge, fast and accurate convolutional neural networks (CNNs) have been
developed [1].
Object detectors can be categorized into two types: region proposal-based methods and
regression-based methods. The former, exemplified by R-CNN [12], Fast-RCNN [13], and
Faster-RCNN [14], follow a two-step process. It generates candidate region proposals
potentially containing objects and then classifies these proposals into specific object
classes. On the other hand, regression-based methods, which are the focus of this article,
simplify detection by treating it as a regression problem, making them more efficient.
In this study, our primary objective is to explore regression-based methods for object
detection in satellite images. We aim to demonstrate that these methods offer simplicity
and improved efficiency compared to region proposal-based approaches. We focus on the
You Only Look Once (YOLO) method, which adopts a single CNN backbone to directly
predict bounding boxes and class probabilities for all objects within an image in real-time
[15]. To further enhance both the speed and accuracy of YOLO, we investigate the Single
Shot MultiBox Detector (SSD), known for its ability to detect and locate small objects
effectively using a default box mechanism and multi-scale feature maps [16]. Additionally,
we explore the RetinaNet detector, which introduces a pyramidal network of features with
the novel focal loss to significantly increase accuracy [17].
The developers of YOLOv4 have dedicated their efforts to enhancing the model's
training accuracy and post-processing of data. It is important to note that the effectiveness
of several advanced techniques in target detection performance has been verified. These
techniques are referred to as 'bag-of-freebies' and 'bag-of-specials' (Figure 2):
1. Bag-of-Freebies (BoF): These are specific improvements to the learning process that
have no impact on inference speed and increase accuracy.
BoF for the backbone network:
Data ugmentation: CutMix, Mosaic.
Regularisation: DropBlock, Label smoothing.
BoF for the detector: Mosaic, Self-Adversarial Training, CIoU-loss, CmBN, Cosine
annealing scheduler, Random training shapes, Optimal hyper parameters.
2. Bag-Of-Specials (BOS): The improvement of the network slightly impacts the
inference time with good performance feedback.
BoS for the backbone network: Mish activation, Cross Stage Partial Network (CSP),
Multi-input weighted residual connections (MiWRC).
BoS for the detector: Mish activation, Modified Spatial pyramid pooling layer (SPP),
Modified Spatial Attention Module (SAM) [24], Modified Path aggregation network
(PAN), DIoU-NMS.
4. Conclusion
The main objective of our research article was to evaluate the performance of the
YOLOv4 convolutional neural network (CNN) model in object detection using remotely
sensed data. To accomplish this, we com-pared the results obtained by YOLOv4 with those
of 12 other CNN object detection models. In our study, we utilized two datasets of remote
sensing images, namely DIOR and HRRSD. The experimental findings clearly indicate
that YOLOv4 outperformed all the other techniques examined. These results serve as
evidence that the new features integrated into YOLOv4 significantly enhance its
performance compared to previous iterations, such as YOLOv3.
Moving forward, there are several promising directions for future research:
Validation on Additional Datasets: It would be valuable to further validate the YOLOv4
model by testing it on additional datasets from diverse sources. By evaluating its
performance on various datasets, we can assess the model's generalizability and robustness
across different remote sensing scenarios.
Performance in Challenging Contexts: Investigate how YOLOv4 performs in
challenging contexts, such as adverse weather conditions, occlusions, or rare object
classes. Understanding the model's behaviour in these scenarios can provide insights into
its limitations and potential areas for improvement.
Transfer Learning: Explore the applicability of transfer learning techniques to fine-tune
the YOLOv4 model for specific remote sensing tasks or domains. This could lead to
improved performance with reduced training data requirements.
Efficiency and Resource Optimization: As mentioned, YOLOv4 requires significant
computational resources for training and inference. Investigate methods to optimize the
model's architecture or develop lightweight versions for deployment on resource-
constrained platforms.
Overall, YOLOv4 offers enhanced accuracy, im-proved feature representation, and better
handling of objects at different scales compared to several CNN object detection models.
However, addressing these future research directions can lead to a more comprehensive
understanding of the model's capabilities and potential areas for improvement in remote
sensing ap-plications.
References
[13] M. Li, Z. Zhang, L. Lei, X. Wang, and X. Guo, “Agricultural Greenhouses Detection in
High-Resolution Satellite Images Based on Convolutional Neural Networks: Comparison
of Faster R-CNN, YOLO v3 and SSD”, Sensors, vol. 20, no. 17, (2020), p. 4938.
[14] A. A. J. Pazhani and C. Vasanthanayaki, “Object detection in satellite images by faster R-
CNN incorporated with enhanced ROI pooling (FrRNet-ERoI) framework”, Earth Science
Informatics, vol. 15, no. 1, (2022), pp. 553–561.
[15] Z. Liu, Y. Gao, Q. Du, M. Chen, and W. Lv, “YOLO-Extract: Improved YOLOV5 for
aircraft object detection in remote sensing images”, IEEE Access, vol. 11, (2023), pp.
1742–1751.
[16] A. Kumar, Z. Zhang, and H. Lyu, “Object detection in real time based on improved single
shot multi-box detector algorithm”, Eurasip Journal on Wireless Communications and
Networking, vol. 2020, no. 1, (2020).
[17] M. Zhu et al., “Arbitrary-Oriented ship detection based on RetinaNet for remote sensing
images”, IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, vol. 14, pp. 6694–6706, (2021).
[18] R. Luo et al., “Glassboxing Deep Learning to Enhance Aircraft Detection from SAR
Imagery”, Remote Sensing, vol. 13, no. 18, (2021), p. 3650.
[19] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOV4: Optimal speed and accuracy
of object detection”, arXiv (Cornell University), (2020).
[20] K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing
images: A survey and a new benchmark,” Isprs Journal of Photogrammetry and Remote
Sensing, vol. 159, (2020), pp. 296–307.
[21] Y. Zhang, Y. Yuan, Y. Feng, and X. Lu, “Hierarchical and robust convolutional neural
network for Very High-Resolution Remote Sensing object detection”, IEEE Transactions
on Geoscience and Remote Sensing, vol. 57, no. 8, (2019), pp. 5535–5548.
[22] A. Mondal and V. K. Shrivastava, “A novel Parametric Flatten-p Mish activation function
based deep CNN model for brain tumor classification”, Computers in Biology and
Medicine, vol. 150, (2022), p. 106183.
[23] J.-N. Lee, J.-W. Chae, and H.-C. Cho, “Improvement of colon polyp detection
performance by modifying the multi-scale network structure and data augmentation”,
Journal of Electrical Engineering & Technology, vol. 17, no. 5, (2022), pp. 3057–3065.
[24] S. Ari and S. Ari, “MU-NET: Modified U-Net architecture for automatic Ocean Eddy
Detection”, IEEE Geoscience and Remote Sensing Letters, vol. 19, (2022), pp. 1–5.
[25] X. Lu, Y. Zhang, Y. Yuan, and Y. Feng, “Gated and Axis-Concentrated localization
network for remote sensing object detection”, IEEE Transactions on Geoscience and
Remote Sensing, vol. 58, no. 1, (2020), pp. 179–192.
[26] G. Cheng and J. Han, “A survey on object detection in optical remote sensing images”,
Isprs Journal of Photogrammetry and Remote Sensing, vol. 117, (2016), pp. 11–28.
[27] S. Xu, T. Fang, D. Li, and S. Wang, “Object classification of aerial images with Bag-of-
Visual words”, IEEE Geoscience and Remote Sensing Letters, vol. 7, no. 2, (2010).
[28] H. Sun, X. Sun, H. Wang, Y. Li, and X. Li, “Automatic target detection in High-Resolution
remote sensing images using spatial sparse coding Bag-of-Words model,” IEEE
Geoscience and Remote Sensing Letters, vol. 9, no. 1, (2012), pp. 109–113.
[29] J. Han et al., “Efficient, simultaneous detection of multi-class geospatial targets based on
visual saliency modeling and discriminative learning of sparse coding”, Isprs Journal of
Photogrammetry and Remote Sensing, vol. 89, (2014), pp. 37–48.
[30] G. Cheng, J. Han, P. Zhou, and L. Guo, “Multi-class geospatial object detection and
geographic image classification based on collection of part detectors,” Isprs Journal of
Photogrammetry and Remote Sensing, vol. 98, (2014), pp. 119–132.