Analysis of Malware Behavior: Type Classification Using Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Analysis of Malware Behavior: Type Classification

using Machine Learning

Radu S. Pirscoveanu, Steven S. Hansen Alexandre Czech


Thor M. T. Larsen, Matija Stevanovic, Jens Myrup Pedersen Ecole Centrale d’Electronique
Aalborg University, Denmark Paris, France
Email: [email protected], [email protected], Email: [email protected]
[email protected], [email protected], [email protected]

Abstract—Malicious software has become a major threat to which is the classical approach. Analyzing the malicious
modern society, not only due to the increased complexity of the code can yield inaccurate information when polymorphic,
malware itself but also due to the exponential increase of new metamorphic and obfuscating methods are used. When
malware each day. This study tackles the problem of analyzing aforementioned methods are applied the complexity increases
and classifying a high amount of malware in a scalable and even more, thus it will be hard to determine which type of
automatized manner. We have developed a distributed malware
malware it is. An alternative to the approach presented, is
testing environment by extending Cuckoo Sandbox that was used
to test an extensive number of malware samples and trace their performing dynamic analysis on the behavior of the malicious
behavioral data. The extracted data was used for the development software which can also be a troublesome task when having
of a novel type classification approach based on supervised to analyze an extensive and increasing number of new
machine learning. The proposed classification approach employs malware. Due to these problems it is therefore favorable
a novel combination of features that achieves a high classification to develop a scalable setup where several malware can be
rate with a weighted average AUC value of 0.98 using Random dynamically analyzed in parallel. A large amount of malware
Forests classifier. The approach has been extensively tested on a samples have been utilized compared to past research articles
total of 42,000 malware samples. Based on the above results it is used for this study. Having a large sample-set adds up to
believed that the developed system can be used to pre-filter novel the predictive power and reliability of the built classifier
from known malware in a future malware analysis system.
which provides satisfactory results. In this study, a system
Keywords: Malware, type-classification, dynamic analysis, has been developed which could be used as a pre-filtering
scalability, Cuckoo sandbox, Random Forests, API call, feature application, where all known types can be filtered from the
selection, supervised machine learning. novel malware. This leaves the opportunity to skip static
analysis on known malware and focus only on analyzing
February 25, 2015 the novel malware, thus drastically increasing the detection
and analysis rate of anti-virus programs. New malware
I. I NTRODUCTION that arise each day are believed to be mostly modified
versions of previous malware using sophisticated reproduction
The trend of the Internet usage has grown exponentially in techniques. Stating this, it is assumed in this study, that
the past years as modern society is becoming more and more malware, even though it is new, can exhibit similar behavior
dependent on global communication. At the same time, the as earlier versions from a dynamic analysis point of view [12].
Internet is increasingly used by criminals and, a large black
market has emerged where hackers or others with criminal
intent can purchase malware or use malicious services for a This study is based on a university report written by this
renting fee. This provides a strong incentive for the hackers to team in [3]. In section II the background and discussion
modify and increase the complexity of the malicious code in about improvements of related work are presented, followed
order to improve the obfuscation to decrease the chances of by the methodology in section III proposing a solution for
being detected by anti-virus programs. This leads to multiple the problems presented in the introduction. Finally, the results
forks or new implementations of the same type of malicious and conclusion will be presented in section IV and section V
software that can propagate out of control. Based on AV-Test, respectively.
approximately 390,000 new malware samples are registered
every day, which gives rise to the problem of processing the II. R ELATED W ORK
huge amount of unstructured data obtained from malware
analysis [2]. This makes it challenging for anti-virus vendors When classifying malware types it is essential to find
to detect zero-day attacks and release updates in a reasonable parameters that can distinguish between their behavior, where
time-frame to prevent infection and propagation. commonly used parameters on Windows platforms are the
Windows API calls. The reason that these are commonly
used is that they include a solid and understandable form of
Meeting this problem, researchers and anti-virus vendors behavioral information since an API call states an exact action
seek towards finding a faster alternative method of detection performed on the computer, e.g. creation, access, modification
that can overcome the limitations imposed by static analysis, and deletion of files or registry keys. In [10] they use hooking
of the system services and creation or modification of files. Ad- supervised machine learning, data generation, data extraction
ditionally they use logs from various API calls to differentiate and classification.
malware from cleanware as well as performing malware family
classification. They include a sample set of 1,368 malware and A. Dynamic Analysis
456 cleanware where they use a frequency representation of
the features. The limitation, also emphasized in their future As mentioned earlier, large amount of malware are released
work, is that they need to expand their sample set and explore on the Internet every day, which makes it more and more
new features. In [16] they made a scalable approach using the suitable to use a dynamic approach in contrast to static
API names and their input arguments, after which they applied analysis. Dynamic analysis is performed in such a way that
feature selection techniques to reduce the number of features malware is executed in a sandbox environment in which it is
for a binary classifier that includes the separation of malware assumed that malware believe it is on a normal machine. Here,
and cleanware. The features used in their setup are limited to all actions performed at run-time, are recorded and saved in a
the API system calls during run-time. Here they have a sample database. This is different from the classical signature-based
set of 826 malware and 385 cleanware. Additionally they apply approach also used in the context of static analysis that is
a frequency representation, as the research mentioned before commonly applied by anti-virus vendors. In this study, Cuckoo
in [10], but also include a binary representation. Furthermore Sandbox has been chosen as the sandbox environment in which
in [5], they use CWsandbox, which applies a technique called the malware will be injected, see [4]. Since Cuckoo is open
APIhooking to catch the behavior of the malware, but in this source, it allows to openly modify the software, which means
paper they strive to classify malware into known families. it is possible to change the code to fit the needs of this study.
They here use a total sample set of 10,072 malware and One of the requirements is to make the system distributed and
utilize a frequency representation of their features. In terms scalable, such that it can be controlled from one central unit
of automatic analysis, [6] has created a framework able to and new virtual machines or physical machines can easily be
perform thousands of tests on malware binaries each day. Here added in order to improve the efficiency of the overall analysis.
they use a sample set of 3,133 malware and use a sequence
representation of their features, which here are the Windows B. Supervised Machine Learning
API calls applied for both clustering and classification. To
understand how API calls are used by malicious programs, [8] Using dynamic analysis to gather behavioral data, it is pos-
have made a grouping of features in relation to their purpose, sible to perform malware type classification using supervised
which can be helpful to understand the malware behavior. In machine learning. We have chosen Random Forests with 160
terms of classification approaches, a wide range of machine trees, which is a decision tree based algorithm that makes
learning algorithms are used such as J48, Random Forests and use of random sub-sampling, or tree bagging, of the sample
Support Vector Machine. The weakness of the related work is space that are then used to create a tree for each subset [7].
the limited amount of samples used to build their classifier. Individual decision making is utilized at each tree for each
Furthermore, this study proposes a feature representation that classification of malware, where the results are then averaged.
combines several of the aforementioned representations to This prevents the possibility of over-fitting, as variance of the
achieve a greater behavioral picture of the malware. classification model decreases when averaged over a suitable
amount of trees. In this study the machine learning tool WEKA
Given that the labels for malware types are provided by has been used, which can be run through a java-based GUI or
anti-virus vendors and based on the related work, it is found directly in the terminal [15].
that supervised machine learning is a valid choice for this
study. Based on a dataset generated from around 80,000 In Figure 1 an overview of the system is depicted as a
malware samples, a feature selection has been performed after flowchart. It includes modules for each of the groups: Data
analyzing the data. In the mentioned research articles, API Generation, Data Extraction and Malware Classification. Each
calls are the mainly used parameter for creating features. group will be explained in the following subsections.
In this study several parameters were chosen as features in
addition to API calls. The additional parameters are: mu- Data Generation Data Extraction
Type
texes, registry keys/files accessed and DNS-requests. In the Filtering
Extraction
(Parameters)
related work, different feature representations were used, i.e. Cuckoo
Sandbox
(ML Labels)

sequence, binary and frequency. The contribution of this study Malware Classification
Begin: Malware
is the unique combination of different feature representations Classification InetSim
Feature Feature
Reduction Representation
and parameters that also apply feature reduction strategies.
Furthermore, our study includes a great amount of malware Weka:
End: Malware
MongoDB Random
samples and behavioral data collected using our setup. This Forests
Classification

allows a solid basis when training the model since it includes


a larger behavioral picture of the malware. Finally, we rely on
Random Forests classifier to perform the classification of the Fig. 1. Overall system flow.
malware types, as a capable ensemble classifier also used by
related work [10], [16].
C. Data Generation
III. M ETHODOLOGY
The Data Generation consists in generating malware anal-
This section will go through the methodology applied in the ysis reports from the execution of approximately 80,000 mal-
development of this study. This includes: Dynamic analysis, ware samples downloaded from Virus Share [13]. To perform
the analysis in a secure, scalable and distributed environment, Dealing with the great amount of unstructured data from
a customized system has been set up. the performed dynamic analysis, it gives rise to the problem
of selecting features that precisely discriminate the malware
The designed system consists of a modified version of types. In this study, a big effort has been put into primarily
Cuckoo Sandbox [4] which permits to perform a faster analyzing the behavioral data given by the API calls as
analysis based on parallel computing. It is a distributed in [10], [16], [5], [6] and [8]. Each API call corresponds
virtual environment composed of 13 personalized guests and to a specific action performed on the system that permits
a control unit. In order to simulate a real environment, the to precisely characterize the malware behavior. This is the
malware is executed within a personalized installation of reason why it has been decided to choose the API as the
Microsoft Windows 7 operating system. Some commonly main parameter. Nevertheless, we have chosen to use the
used software are installed (Skype, Flash, Adobe Reader, other parameters to play a complementary role in the malware
etc.) along with a batch script that simulates web activity. classification.
The modifications made permit to obtain a distributed system
which is also scalable, making it possible to easily add or
remove virtual machines. Moreover, it has been noticed that The second step consists in labeling the samples to be
during their execution, some malware intended to connect used in supervised machine learning. Once the analysis is
to the Internet. This raised security requirements related to done, Cuckoo Sandbox provides a report that includes a list of
potential malicious traffic activity. The challenge has been to anti-virus programs along with the corresponding labels. The
emulate a realistic environment without allowing any malware challenge is to find the anti-virus program that gives both the
to communicate with a third party. To complete the security best detection rate and the most precise labeling. Given these
needs, a confined environment has been built. We configured requirements, VirusTotal, [14], provided labels for around
InetSim, an Internet emulator, so that it responds to the 52,000 samples based on detection from Avast [1]. From the
malware requests and deceive the malicious program into sample set, four different types were selected, represented by
perceiving that it is online [11], [9]. On the other hand, the 42,000 suitable samples for classification, namely:
internal system configuration permits to avoid the corruption Trojan: includes another hidden program which performs
of the analysis environment. This is why the virtual machines malicious activity in the background.
and their hosts are running on two different operating systems.
To enhance security, all the commands are sent from the Potentially Unwanted Program: is usually downloaded to-
control unit to the hosts using Secure Shell. gether with a freeware program without the user’s consent,
e.g. toolbars, search engines and games.
Finally, the collected data, consisting of recorded malware Adware: aims at displaying commercials based on the user’s
actions, is stored using the DataBase Management System, i.e. information.
MongoDB. This type of DBMS is particularly useful in dealing Rootkit: has the capability to obfuscate information like run-
with a large database that includes unstructured data which is ning processes or network connections on an infected system.
the case in this study.
E. Malware Classification
D. Data Extraction
This section will go through the Feature Representation
The Data Extraction consists in keeping the most pertinent and Feature Reduction that are the two preliminary steps
information to be used in a machine learning algorithm. First, before using WEKA toolbox to both perform classification and
the reports provided by Cuckoo give a wide set of information measure the predictive performances of the training model.
about the malware behavior, namely: DNS requests, Accessed
1) Feature Representation: The built features tend to give
Files, Mutexes, Registry Keys and Windows API’s. To be sure
a meaning to the chosen parameters. The total number of 151
that this information is only related to malicious activity, a
different API calls are used as the main features, whereas
white list is created to clean the data from the non malicious
complementary information is derived from the other param-
activity. It consists in recording the user’s activity by browsing
eters. The features are gathered within a matrix where each
the operating system and the Internet. Afterwards this behav-
row represents a malware sample and each column gives
ioral data is removed from the malware analysis reports.
the corresponding value of a specific feature. These are the
Parameters Before filtering After filtering Percentage Decrease
different representations:
DNS (all levels) 2,000 1,986 0.7 %
DNS (TLD & 2-LD) 1,549 1,543 0.39 %
Sequence: For each sample the course of the 200 first API
Accessed Files 673,554 18,322 97.28 % calls during the malware execution, is used. This number has
Mutexes 11,287 9,875 12.51 % been chosen to obtain a reasonable matrix size. Besides, the
Registry Keys 164,979 94,505 42.71 %
initial sequence of API calls has been modified to improve
the matching between malware that have similar patterns. The
TABLE I. S UMMARY OF FILTERING RESULTS . interest of this modification is illustrated in Figure 2.

Table I lists the different parameters that have been filtered


from the analysis. One can see that no filtering is applied on
the API calls since they are logged based on the process ID
of the malware.
Normal
Sequence 1 12 26 100 ... 6 6 6 6 6 5 24 32 28 ... course of the 40 first actions without repetitions performed
during the malware analysis. Afterwards, the frequency of
the bins gives the occurrence of the 151 API calls grouped
Modified
Sequence 1 12 26 100 ... 6 5 24 32 28 ... by type, during run-time. Finally, the counters provide the
most general information since they give the occurrence of
the complementary parameters but without indication of the
Normal/Modified 99 123 43 ... 8 40 24 32 28 ...
action performed.
Sequence 2

Fig. 2. Sequence Modification. IV. R ESULTS AND D ISCUSSION


The classifier is configured through a development phase
and training phase, after which it is tested, producing results
Since the initial sequence size has been limited to 200 API that will be evaluated in this study. The total number of
calls, it is likely that the repetition of the same API hides samples being used for the training and testing phase are
patterns that are out of the scope. Thus, to retrieve eventual 42,068 samples, from which 67 % represents the training
hidden similarities, the sequence is modified so that it gives the set and the remaining 33 % represents the testing set. In
succession of actions performed without taking care of their the following, the development and training phases will be
frequency. It is done by removing the repetition of the same presented, along with the results of the classification.
API called in a row.
Frequency: This matrix is composed of 151 columns corre- A. Development and Training
sponding to the set of API calls. The frequency of each API
call is calculated from the malware analysis. The development phase is used to decide the number of
trees that should be used in the Random Forests algorithm.
Counters: This matrix is composed of 8 columns which Here, 160 trees were chosen to provide a good balance between
corresponds to the count of the 8 following parameters: DNS improved results and computational time. The results for the
request (all levels), DNS request (TLD and 2-LD), Accessed development phase were evaluated using the training set with
Files (including 3 file extensions), Mutexes and Registry Keys. 10-fold cross-validation. After deciding the parameters for the
2) Feature Reduction: In order to perform efficient large RF algorithm, a training phase was used to construct the
scale analysis by combining different behavioral features, model used to classify the four different types of malware
two dimensionality reduction methods are used. The first one presented in section III. The training phase also had a second
is applied on the sequence matrix and consists in reducing purpose, namely to choose the feature representation that
the initial sequence length by observing the impact on the should be used to configure the classifier, since multiple
classification’s performance. Here, we have kept 40 features representations have been examined in this study. The Area
since we found that it contains substantial information to Under the Curve (AUC value) and F-measure are provided by
classify malware types. In addition, it has been chosen to a 10-fold cross-validation for different feature representations.
combine the frequency of the 151 API calls into bins of the These are trained with a different number of features whereas
same category inspired from [8]. Thus, 24 bins are created an objective choice was made based on the results to construct
and can be grouped into 7 categories (Registry Management, a matrix by combining multiple feature representations. Based
Windows Services, Processes etc). on the results, the combined feature representation was chosen,
as it gave the best AUC value and F-measure compared to the
other representations examined. The combination will include
The challenge of the Feature Reduction is to minimize the 40 first distinct API calls, 24 frequency bins and 4 counters
the number of features without loosing the performance of namely the count of distinct mutexes, files, registry keys and
the classification. The individual reduction of features aims at all levels of the DNS.
limiting the final number of features within the combination
matrix. B. Results
Sequence Frequency Counters The results from the testing phase will be presented in
8 features
200 features 151 features
the form of a table with the most important available metrics,
together with ROC curves and a confusion matrix. In Table II
Modified Sequence Feature Reduction the True Positive Rate (TPR), False Positive Rate (FPR),
200 features
Precision, F-measure and AUC value can be found for each
Feature
class/type. To summarize the results from the table, ROC
Modified Sequence Frequency Bins Counters Combination
40 features 24 features 4 features curves can be found in Figure 4 and a confusion matrix in
Figure 5. Below, each class will be analyzed based on the
Fig. 3. Construction of the combination matrix.
results found in Table II, Figure 4 and Figure 5.
1) Trojan: Based on the results, the classifier revealed
Figure 3 shows the transformations performed to build a the best performance for this type of malware, which also
matrix which is a combination of the different features. The can be due to the fact that it has the largest amount of
model conceptually gives a more and more general information samples compared to the other three types examined. With the
about the malware behavior. It is noticeable that the sequence classifiers high precision and F-measure of respectively 0.961
is modified and reduced so that the new sequence gives the and 0.960, it shows promising classification results for this
type. This conclusion is also supported by the high AUC value 5) Summary of results: The weighted average calculated of
of 0.989. Looking at the ROC curve in Figure 4, it can be seen all types reveals a high discriminative power of the classifier in
to be well behaving and steep, leading to a high discriminative terms of the AUC value. As discussed before, the FPR becomes
power. Looking at the confusion matrix in Figure 5 it can be low since the TNs are much larger in contrast to the FPs. The
seen that the classifier has a potential problem to distinguish weighted average of the F-measure shows less discriminative
Trojan from Adware. It should be noticed that the number of power as PUP and Rootkit types have a high number of FNs in
FNs is small compared to the number of Trojan samples. comparison with the number of samples. Overall the classifier
has a satisfactory performance with an AUC value close to the
2) Potential Unwanted Programs - PUP: The classifier theoretical maximum and an F-measure just below 0.9.
shows a good classification performance for PUP in compar-
ison with Trojan. Looking at Table II, the precision and F- Class TPR FPR Precision F-measure AUC
measure are 0.939 and 0.850 respectively. These values are Trojan 0.959 0.052 0.961 0.960 0.989
PUP 0.777 0.015 0.939 0.850 0.978
lower than the results for Trojan, but overall the performance of Adware 0.858 0.085 0.693 0.767 0.955
the classifier is still satisfactory. The lower F-measure is caused Rootkit 0.791 0.001 0.947 0.862 0.970
by the TPR, which is lower than the precision. Looking at the Weighted Avg. 0.896 0.049 0.907 0.898 0.980
ROC curve it is seen to be well behaving, but not as steep as
Trojan. This is expected from the results of the table, however TABLE II. R ESULTS OF TESTING PHASE .
PUP has an AUC of 0.978, which is considered satisfactory.
Presenting the confusion matrix in Figure 5, it can be noticed
that the classifier mostly confuses PUP with Adware, having ROC Curves for each type
1
672 FNs.
0.9

3) Adware: Before the test, some behavioral similarities 0.8


were expected between Adware, Trojan and PUP, since these
0.7
malicious programs infiltrate the infected machine using com-
mon methods, however with a different end goal. The precision 0.6

and F-Measure of Adware, seen in Table II have the lowest


TPR
0.5
value of all tested types due to the large number of FPs which
also results in an FPR of 0.085 and a number FNs that push the 0.4

TPR to 0.858. The AUC value is 0.955 which is also the lowest 0.3

of all four types. By looking at the ROC curve in Figure 4, Adware


0.2
even though it is far beyond the theoretical ROC curve of a PUP
Rootkit
random classifier (blue dashed line), more information might 0.1 Trojan
Random
be needed to be able to better discriminate this type. The 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
confusion matrix in Figure 5 generated by the classifier, shows
FPR
that a large portion of the Adware samples have been correctly
classified but with approximately one fourth of the samples Fig. 4. Receiver Operating Characteristics for the classified types.
being classified as PUP or Trojan. PUPs can be classified as
Adware depending on the severity of the logging or content
presented to the user, however due to the large sample size of Confusion matrix of the classifier
Trojan in comparison with other types, the TNs remain very 0.9
7792 70 242 5
Trojan

high thus the FPs’ number is shadowed by the TNs resulting


in a low FPR. 0.8
0.7
4) Rootkit: It represents one of the most distinct malware
40 2578 118 10 0.6
PUP

types in comparison with Trojan, Adware and PUP. The action


Predicted

it performs on the compromised machine should present a 0.5


behavioral pattern that can easily be distinguished as it tries 0.4
289 672 2224 23
Adware

to mask itself inside system components. Even though the


number of samples in training and testing is significantly 0.3
lower than other types, its unique behavior resulted in a 0.2
0 0 8 144
Rootkit

precision of 0.947 which represents the second highest value 0.1


of all tested types. However the F-measure has a value of
0.862, which is due to the lower TPR. The FPR has a value 0.0
Trojan PUP Adware Rootkit
close to zero due to the large number of TNs and a low value Condition
of 8 for the FPs. The FNs of Rootkit represent a large number
in comparison to the low number of samples resulting in a Fig. 5. Results of the classifier in the form of a confusion matrix.
lower TPR, which affected the F-measure. The ROC curve
presents a high discriminative power in comparison with the
C. Discussion
other presented types with an AUC value of 0.970, which is
closer to the values of the types that have a more dominant This section elaborates on the obtained results assessing if
number of samples. the results for the particular malware type are good enough to
be used in a future system that can pre-filter newly registered
malware samples. It should be noticed that future work will The accuracy of the pre-filtering system can be improved
be devoted to optimize the classifier such that a pre-filtering by theoretically having behavior information for all existing
system can be developed to identify novel malware samples types, thus allowing the detection of novel malicious software
and sort out legacy malware that have minor changes. by using a probability threshold.
Trojan - The problem of having a small number of Trojan
samples classified as Adware can be caused by the fact that V. C ONCLUSION
some of the Adware samples actually are Trojans, which have
been used to install the Adware while running the experiment. This study proposes a novel malware classification
With this small amount of FPs the classifier still performs approach developed in order to provide an accurate
satisfactory for Trojan and it can in fact be used as pre-filtering classification of malware types within the dynamic analysis of
for this type. malware. It relies on a novel set of features that successfully
capture differences in the behavior of malware types. Starting
PUP - Has a definition that may include other types. It from the estimation made by AV-Test where approximately
could be classified as Adware depending on the amount of 390,000 new malware are released every day, the proposed
content presented to the user. From the analysis of the results malware analysis approach aims to provide a pre-filtering
a certain amount was classified as Adware. This problem can solution to this problem in order to distinguish novel malicious
be caused by the fact that Adware and PUPs are similar, since software that has significantly different behavior from malware
PUPs are just a less severe case of Adware, making a common known by the classifier. Having three main modules: Data
behavior possible. The classifier performs well for this type of Generation, Data Extraction and Malware Classification,
malware and it is possible to use it as pre-filtering. this study provides a fast, distributed and secure method of
Adware - Similarities between the behavior of Adware and analyzing malware with high predictive performance.
PUP shown by the classifier might be an incentive to retrieve
even more detailed information about API calls. The classifier The combination of features from approximately 80,000
for this type performs fairly good, but needs to be improved samples consists of: 24 API Frequency Bins, Modified Se-
before it can be used as pre-filtering with satisfactory results. quence of 40 first distinct API calls and four Counters collected
It was found that the label of Adware from Avast was unclear using a modified version of Cuckoo Sandbox. The combination
and too generic, which could explain the results. This problem of behavioral information proved to be very detailed allowing
can be solved by a more clear and categorized definition of us to detect the correct types after passing it through Random
this type. Forests algorithm with 160 trees. The weighted average re-
sults gathered using the novel feature representation, are very
Rootkit - The classifier performs well regardless of the
satisfactory. Having a steep Receiver Operator Curve, an Area
low amount of samples, which could be because of its distinct
Under the Curve of 0.98 (close to the theoretical maximum), a
behavior. Therefore this classifier can be used as pre-filtering,
precision of 0.9 and an F-measure of 0.898, the classifier has
but in order to ensure a good performance, more samples
proven to have a high discriminative and predictive power,
should be used to train the model.
which can be used to filter novel from known malware.

Using the system as a pre-filter - Based on the individual ACKNOWLEDGMENTS


and weighted average results, it can be concluded that the sys- During this research project we received substantial help
tem created can be used in part of a pre-filtering application. It from VirusTotal by providing access to their vast database con-
will work by rejecting malware as novel, when the malware can taining labels from 56 anti-virus programs that allowed us to
not be classified as any type with a high enough probability. perform classification as accurately as possible. Furthermore,
The results for the individual types are satisfactory for every we thank Virus Share by providing access to their library of
type except Adware, which is the only type that pulls down novel and known malicious programs from which we have
the performance if looking at the weighted average results. collected the behavioral data.

D. Future Research R EFERENCES


[1] Avast. ”Avast 2015.” Internet: https://www.avast.com, 2015 [Feb. 22,
In order to improve the presented classification of malware 2015].
types, future research can be done to achieve an even better [2] AV-test, Malware, 2014. The Independent IT-Security Institute 2014.
discrimination. Having a uniform distribution of all malware http://www.av-test.org/en/statistics/malware/ [Feb. 22, 2015].
types can improve the results of the classifier by assigning [3] A. Czech, R. S. Pirscoveanu, S. S. Hansen and T. M. T. Larsen Analysis
the same weight on all samples instead of favoring Trojan of Malware Behavior: Type Classification using Machine Learning, 2014
due the clear advantage in samples size. This will affect the Aalborg, Denmark.
classifier by having a fair distribution in the TNs for each type [4] Cuckoo Foundation. ”Automated Malware Analysis - Cuckoo Sandbox.”
and assigning the statistics for TPR and TNR over a uniform Internet: http://www.cuckoosandbox.org/, 2014 [Feb. 22, 2015].
number of samples. In this particular case, the number of [5] K. Rieck, T. Holz, C. Willems, P. Düssel and P. Laskov Learning and
FNs, FPs and TPs should not shadowed by a large number of Classification of Malware Behavior, 5th Int. Conf. DIMVA 2008 Paris,
France: Springer, 2012.
TNs. That being said, a more detailed approach in building
[6] K. Rieck, P. Trinius, C. Willems and T. Holz Automatic of Analysis of
the API features can be taken by looking at the arguments Malware Behavior using Machine Learning, 19th Vol. Issue 4 Jour. of
passed during the API calls. This could give a more detailed Comp. Sec. 2011 Amsterdam, The Netherlands: IOS Press Amsterdam,
view of the malicious behavior expressed by the samples. 2012.
[7] L. Breiman. (2001, Jan.). Random Forests. [Online]. Available:http://oz. [12] U. Bayer, E. Kirda, C. Kruegel Improving the Efficiency of Dynamic
berkeley.edu/∼breiman/randomforest2001.pdf [Feb. 22, 2015]. Malware Analysis, 25th Symposium On Applied Computing (SAC),
[8] M. Alazab, S. Venkataraman and P. Watters, Towards Understanding Track on Information Security Research and Applications Lusanne,
Malware Behaviour by the Extraction of API calls, 2nd CTC 2010 Switzerland, 2010.
Ballarat (VIC), Australia: IEEE, 2012. [13] VirusShare. ”VirusShare.com - Because Sharing is Caring.” Internet:
[9] M. Platts. ”The Network Connection Status Icon.” Internet: http://virusshare.com/, Feb. 22, 2015 [Feb. 22, 2015].
http://blogs.technet.com/b/networking/archive/2012/12/20/ [14] VirusTotal. ”virustotal.” Internet: https://www.virustotal.com/, 2015
the-network-connection-status-icon.aspx, Dec. 20, 2012 [Feb. 22, [Feb. 22, 2015].
2015]. [15] WEKA. ”Weka 3: Data Mining Software in Java.” Internet: http://www.
[10] R. Tian, R. Islam, L. Battern and S. Versteeg, Differentiating Malware cs.waikato.ac.nz/ml/weka/, 2014 [Feb. 22, 2015].
from Cleanware Using Behavioral Analysis, 5th Int. Conf. on Malicious [16] Z. Selehi, M. Ghiasi and A. Sami, A miner for malware detection based
and unwanted Software Nancy, France: IEEE, 2010. on api function calls and their arguments, 16th AISP 2012 Shiraz,
[11] T. Hungenberg, M. Eckert. ”INetSim: Internet Services Simulation Fars: IEEE, 2012.
Suite.” Internet: http://www.inetsim.org/, 2014 [Feb. 22, 2015].

You might also like