Android Malware Detection Using Machine Learning

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

ANDROID MALWARE DETECTION

USING MACHINE LEARNING


QIS COLLEGE OF ENGINEERING AND TECHNOLOGY

INFORMATION TECHNOLOGY

Guided By: Mr.K.SREENATH, M.Tech


(Assistant Professor)

Presented by:
D.Akhila (17491A1207)
G.Jahnavi (17491A1211)
N.Preethi (17491A1220)
T.Bhaanu (17491A1229)
V.Srikanth (17491A1255)
CONTENTS

1. Abstract
2. Introduction
3. Existing System
4. Proposed System
5. Requirements
6. Modules
7. Modules Description
8. Methodologies
9. Output screenshots
10. Result
11. Conclusion
Abstract

Android is being the world's most popular operating system


having a billion of users and it has drawn the attention of cyber
criminals operating particularly through wide distribution of
malicious applications. In recent years, wide ranging researches
have been conducted on malware analysis and detection but
that techniques were not able to detect unknown malware. So, to
detect the malware we proposed to use effectual machine
learning approaches by making use of evolutionary genetic
algorithms. They are used to train machine learning classifier
and their capabilities in identification.
Introduction

Malware is nothing but malicious software ,and it is


intentionally designed to cause damage to a computer devices,
tablets and also smartphones.
There are six types of malicious software they are Viruses,
Worms, Torjan Horse, Spyware, Adware and Ransomware.
Due to open source nature most of the people are simply
installing this malware apps from google play store and giving
their personal details to access the application .
So here we developing a antivirus software by using malware
analysis and reverse engineering.
Existing System

The main contribution of the work is reduction of feature


dimension to less than half of original feature-set using
Genetic Algorithm such that it can be fed as input to
machine learning classifiers for training with reduced
complexity while maintaining their accuracy in malware
classification.
The optimized feature set obtained using Genetic
algorithm is used to train two machine learning
algorithms: Support Vector Machine and Neural Network.
Advantages

A linear support vector machine (SVM) to detect Android


malware and compare the malware detection performance of
SVM with that of other machine
Reduction of feature dimension to less than half of original
feature-set by using Genetic Algorithm such that it can be fed
as input to machine learning classifiers for training with
reduced complexity while maintaining their accuracy in
malware classification.
Disadvantages

SVM algorithm is not suitable for large data sets.


SVM does not perform very well when the data set has more
noise i.e. target classes are overlapping.
Proposed system

Two set of Android Apps or APKs: Malware/Good ware are


reverse engineered to extract features such as permissions and
count of App Components such as Activity, Services, Content
Providers, etc.
In the proposed methodology, static features are obtained from
AndroidManifest.xml The optimized feature set obtained using
Genetic algorithm is used to train two machine learning
algorithms: Support Vector Machine and Neural Network.
Advantages

By using genetic algorithm it gives the best optimized solution


In this the dynamic analysis model can hit up to 9accuracy
in detecting malware.
Static analysis can achieve 81% of accuracy.
Disadvantages

Limited data set,


Lower detection rate,
High computation burden,
Implementation of limited classification algorithms
Software Requirements

For developing the Application


1. Python
2. Mysql
3. Mysql client
4. Wamp Server 2.4
Hardware Requirements

Operating System supported by


1. Windows
Processor – Pentium IV or higher
RAM ( 4GB)
System Architecture
Modules

Upload Android dataset


Generate Train & test model
Pre-processing
Run SVM & Neural network alg
Display Accuracy Graph
Upload Android dataset

We have collected the malware datasets from the previous


attacks. So we have to upload these datasets to our proposed
model.
Based on these datasets, It will identify whether the android is
malware or not.
Generate Train & test model

In this module, we divide the datasets into 80:20 ratio. After


that, we have to give training to our model to identify the
malware apps and links in android.
After completing the training on the model. We have to verify
whether the model is working properly or not. For that reason,
we have to check the model with the test data.
Pre-processing

In this module, We have to analyze the complete data and


remove the null values and filter the data according to our
requirements.
Display Accuracy Graph

After completing the analysis report we can see the analysis in


the form of a graph that how much accurate we are getting by
using these algorithms.
The graph indicates, the x-axis represents algorithm name and
the y-axis represents accuracy and in all SVM got high accuracy.
Run SVM & Neural network alg

For analyzing the input data with the datasets we are using the
feature selection and genetic algorithm. In feature selection
algorithms we are using the SVM algorithm for better
accuracy. We are also using a Neural network algorithm to
compare which is better
We are mixing the genetic algorithm with both algorithms and
we can also test them independently so that we can identify
which is better.
Methodologies

Static Analysis
Dynamic Analysis
Reverse Engineering
Support Vector Machine
Output screenshots
Result

Android Malware Detection Using Genetic Algorithm based


Optimized Feature Selection and Machine Learning. ... The
experimentation results validate that Genetic algorithm gives
most optimized feature subset helping in reduction of feature
dimension to less than half of the original feature-set.
conclusion

As the number of threats posed to Android platforms is


increasing day to day, spreading mainly through malicious
applications, it is very important to design a framework which
can detect such malwares with accurate results.. The proposed
methodology attempts to make use of Genetic Algorithm to get
most optimized feature subset used to train machine learning
algorithms in most efficient way.
THANK YOU

You might also like