Final Synopsisi 2
Final Synopsisi 2
Final Synopsisi 2
Project title : Ensemble model for Detecting Phishing and Trojan using Latest
Machine Learning Technique.
Group no : 4.
Group member : 1.Deore Krushna .
2. Sonawane Anand .
3. Bhamare Vaibhav.
4. Dhikale Shubham.
Abstract :
Phishing is an online threat where an attacker impersonates an authentic and trustworthy
organization to obtain sensitive information from a victim. One example of such is trolling,
which has long been considered a problem. However, recent advances in phishing detection,
such as machine learning-based methods, have assisted in combatting these attacks. Therefore,
this paper develops and compares four models for investigating the efficiency of using machine
learning to detect phishing domains. It also compares the most accurate model of the four with
existing solutions in the literature. These models were developed using artificial neural networks
(ANNs), support vector machines (SVMs), decision trees (DTs), and random forest (RF)
techniques. Moreover, the uniform resource locator’s (URL’s) UCI phishing domains dataset is
used as a benchmark to evaluate the models. Our findings show that the model based on the
random forest technique is the most accurate of the other four techniques and outperforms other
solutions in the literature.
Machine learning, with its ability to analyze vast amounts of data and identify patterns, has
emerged as a critical tool in the fight against cyber threats. However, the constantly evolving
nature of these threats demands advanced techniques to stay ahead of attackers. In this context,
ensemble models using the latest machine learning techniques have emerged as a compelling
approach to bolster the security of digital systems and protect against phishing and trojan
attacks.
This ensemble model combines the strengths of multiple machine learning algorithms and
models to create a unified and formidable defense against malicious activities. By leveraging the
power of diverse algorithms, data representations, and feature extraction methods, ensemble
models offer the potential to significantly enhance the accuracy, robustness, and adaptability of
cybersecurity systems.
Fig:system
Architecture.
2. Disadvantages:
Increased complexity and resource requirements.
Longer training times and computational overhead.
Reduced model interpretability.
Potential for overfitting.
Maintenance challenges with evolving techniques.
Deployment complexities, especially for real-time applications.
4.Technologies Used :
2. Model Selection:
Employ state-of-the-art machine learning techniques:
Deep Learning: Convolutional Neural Networks (CNNs) for image-based
phishing detection, Recurrent Neural Networks (RNNs) for text-based detection.
Gradient Boosting: XGBoost, LightGBM, or CatBoost.
Random Forests: A versatile ensemble technique.
3. Ensemble Techniques:
Combine multiple models for improved performance:
Voting Classifier: Combines predictions of multiple base models using majority vote.
5. Real-time Implementation:
Consider deployment challenges for real-time detection.
Optimize computational efficiency.
References :
1. Cabaj, K.; Domingos, D.; Kotulski, Z.; Respício, A. Cybersecurity Education: Evolution of
the Discipline and Analysis of Master Programs. Comput. Secur. 2018, 75, 24–35.
[CrossRef]
2. Iwendi, C.; Jalil, Z.; Javed, A.R.; Reddy, G.T.; Kaluri, R.; Srivastava, G.; Jo, O.
KeySplitWatermark: Zero Watermarking Algorithm for Software Protection Against Cyber-
Attacks. IEEE Access 2020, 8, 72650–72660. [CrossRef]
3. Rehman Javed, A.; Jalil, Z.; Atif Moqurrab, S.; Abbas, S.; Liu, X. Ensemble Adaboost
Classifier for Accurate and Fast Detection of Botnet Attacks in Connected Vehicles. Trans.
Emerg. Telecommun. Technol. 2020, 33, e4088. [CrossRef]
4. Conklin,W.A.; Cline, R.E.; Roosa, T. Re-Engineering Cybersecurity Education in the US:
An Analysis of the Critical Factors. In Proceedings of the 2014 47th Hawaii International
Conference on System Sciences, IEEE,Waikoloa, HI, USA, 6–9 January 2014;pp. 2006–
2014.
5. Javed, A.R.; Usman, M.; Rehman, S.U.; Khan, M.U.; Haghighi, M.S. Anomaly Detection in
Automated Vehicles Using Multistage Attention-Based Convolutional Neural Network.
IEEE Trans. Intell. Transp. Syst. 2021, 22, 4291–4300. [CrossRef]
6. . Mittal, M.; Iwendi, C.; Khan, S.; Rehman Javed, A. Analysis of Security and Energy
Efficiency for Shortest Route Discovery in Low-energy Adaptive Clustering Hierarchy
Protocol Using Levenberg-Marquardt Neural Network and Gated Recurrent Unit for
Intrusion Detection System. Trans. Emerg. Telecommun. Technol. 2020, 32, e3997.
[CrossRef]
7. Bleau, H.; Global Fraud and Cybercrime Forecast. Retrieved RSA 2017. Available online:
https://www.rsa.com/en-us/resources/ 2017-global- fraud (accessed on 19 November 2021).
8. Computer Fraud & Security. APWG: Phishing Activity Trends Report Q4 2018. Comput.
Fraud Secur. 2019, 2019, 4. [CrossRef]
9. Hulten, G.J.; Rehfuss, P.S.; Rounthwaite, R.; Goodman, J.T.; Seshadrinathan, G.; Penta,
A.P.; Mishra, M.; Deyo, R.C.; Haber, E.J.; Snelling, D.A.W. Finding Phishing Sites; Google
Patents: Microsoft Corporation, Redmond,WA, USA, 2014.
10. What Is Phishing and How to Spot a Potential Phishing Attack. PsycEXTRA Dataset.
Available online: https://www.imperva. com/learn/application-security/phishing-attack-
scam/ (accessed on 20 November 2021).
11. Gupta, B.B.; Tewari, A.; Jain, A.K.; Agrawal, D.P. Fighting against Phishing Attacks: State
of the Art and Future Challenges. Neural Comput. Appl. 2016, 28, 3629–3654. [CrossRef]
12. Zhu, E.; Ju, Y.; Chen, Z.; Liu, F.; Fang, X. DTOF-ANN: An Artificial Neural Network
Phishing Detection Model Based on Decision Tree and Optimal Features. Appl. Soft
Comput. 2020, 95, 106505. [CrossRef]
14. . Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
15. . Friedman, J.H. The Elements of Statistical Learning: DataMining, Inference, and
Prediction; Springer Open: Berlin/Heidelberg, Germany, 2017.