Papers by Rajesh Kalakoti
IEEE internet of things journal, 2024
IEEE Access
Attackers compromise insecure IoT devices to expand their botnets in order to launch more influen... more Attackers compromise insecure IoT devices to expand their botnets in order to launch more influential attacks against their victims. In various studies, machine learning has been used to detect IoT botnet attacks. In this paper, we focus on the minimization of feature sets for machine learning tasks that are formulated as six different binary and multiclass classification problems based on the stages of the botnet life cycle. More specifically, we applied filter and wrapper methods with selected machine learning methods and derived optimal feature sets for each classification problem. The experimental results show that it is possible to achieve very high detection rates with a very limited number of features. Some wrapper methods guarantee an optimal feature set regardless of the problem formulation, but filter methods do not achieve that in all cases. The feature selection methods prefer channel-based features for detection at post-attack, communication, and control stages, while host-based features are more influential in identifying attacks originating from bots. 12 INDEX TERMS Feature selection, machine learning, Internet of Things, botnet, intrusion detection. I. INTRODUCTION 13 IoT (Internet of Things) is shaping the way we live our human 14 lives [1], from tiny toys to home-made applications to smart 15 cities. IoT is a system of interrelated devices connected to the 16 Internet to transmit and receive data from one device to other 17 parts of the system; it can be an edge device, a cloud server, 18 or another field device. At the same time, the IoT security 19 issue has become more important as an enormous amount of 20 data is associated with IoT networks. Due to the exponential 21 growth of IoT devices [2], hackers and cybercriminals have 22 more opportunities to exploit network vulnerabilities [3], 23 resulting in various IoT-based botnet attacks [4], [5], [6]. The 24 botnet, a large set of compromised machines controlled by 25 attackers, is one of the strongest threats on the Internet to 26 48 cious activities. An organization hosting various IoT devices 49 is interested in the identification of devices that are compro-50 mised by IoT bot malware; therefore, its focus is much more 51 on detection at formation, C&C or post-attack phases. On the 52 other hand, organizations receiving attacks from IoT bots aim 53 to prevent malicious traffic launched during the attack and 54 post-attack phases. Therefore, it is important to develop a 55 monitoring system that encompasses the entire botnet life 56 cycle. This endeavor requires a more in-depth understanding 57 of malicious actions and their characterization in each phase. 58 The Internet of Things (IoT) has received great atten-59 tion in research on network anomalies and intrusion detec-60 tion [10]. Malicious network traffic has been detected with 61 conventional signature-based solutions such as Snort [11] or 62 Suricata [12]. The drawback of signature-based systems is 63 the inability to detect unknown or previously unidentified 64 attacks, in addition to the obstacles that arise from misman-65 agement of signatures. 66 Instead of signature-based solutions, a behavior-or 67 anomaly-based solution goes beyond identifying individual 68 attack signatures to detect and analyze malicious behavior 69 patterns. Machine learning is considered a viable solution 70 that detects new variants of attacks with the elimination of 71 the need for signatures. Although the application of statistical 72 machine learning (ML) techniques has demonstrated highly 73 accurate classification results in malicious traffic detection 74 problems [13], feature selection as an important step in the 75 ML workflow has not been fully addressed. The curse of 76 dimensionality can be a concern that decreases detection per-77 formance due to overfitting when classifiers are trained with a 78 large number of features [14]. In addition, a high-dimensional 79 feature space may require more computing resources when 80 the models are deployed in the operational environment. 81 In most cases, intrusion detection systems should handle 82 a large volume of network traffic, so maximizing resource 83 usage is vital. IoT environments bring additional restrictions, 84 so that detection sensors, system components that are respon-85 sible for the collection of network traffic and performing the 86 detection function, may run on resource-constrained devices 87 (e.g., edge devices). Therefore, reducing the size of the 88 feature set can improve the performance of ML models in 89 many ways. Additionally, feature selection helps to achieve 90 a deeper understanding of the underlying approaches that 91 rendered the data, since fewer features would be more per-92 ceivable by experts. 93 Various academic works [15], [16], [17], [18], [19], [20], 94 [21] use feature selection techniques to improve the detection 95 scores of existing ML classifiers. However, these studies 96 do not explore the impact of feature selection methods on 97 different binary and multiclass classification formulations 98 that can be performed for intrusion detection at various 99 157 II. BACKGROUND AND LITERATURE REVIEW 158 A. BOTNET DETECTION 159 Researchers have introduced traditional machine learning and 160 data mining methods for botnet detection in recent decades 161 and made significant advances. BotMiner [24], [25] and 162 BotSniffer [26] used statistical algorithms to detect malicious 163 traffic on an IoT network that is part of a botnet. 164 The Bayesian optimization Gaussian process (BO-165 GP) [27] is combined with the decision tree classifier as an 166 optimized ML-based framework to detect botnet attacks on 167 IoT devices. The detection rate for binary classification is 168 improved to 99%.99 when the accuracy, precision, recall, and 169 f1 score metrics are compared to the Decision Tree, SVM,
Uploads
Papers by Rajesh Kalakoti