Research For The Conference On Cyber Security
Research For The Conference On Cyber Security
Research For The Conference On Cyber Security
Network Security
Abstract The paper explores how log analysis is key for security is presented and deeper discussion on various
enhancing network security of enterprises. Now a days the issues proposed log analysis techniques is made. Also, specific works
of security becomes great concern because of the interconnection in relation to log analyzer having different approaches will be
among organizations with WWW. Routine log analysis is elaborated in the next sections which are related and relevant in
beneficial for identifying security incidents, policy violations, terms of their objectives.
fraudulent activity, and operational problems. Security is a
means for assuring health and help to identify attacks.
Enterprises must perform log analysis to discover different 2. LOG FILE ANALYSIS
attacks by considering heterogeneous log records. We used Currently security gets more attention in many organizations
multilevel log analysis to identify attacks found at different layers than ever before. This is due to the growth of Internet and
of data center through scrutinizing log events of various network dynamic nature of emerging attacks towards an organizations
devices, applications and others. Thus, to discover different data center [1]. When organizations ensure security in their
attacks considering heterogeneous log records are basis for business strategy, then the confidentiality, integrity and
analysis. In our work log records were organized together into availability of data will be assured in the data center. Security
common format and analyzed based on their features. In central
requirement has a direct relationship with the growth of a data
engine clustering and correlation are core of log analyzer that
work together with attack knowledge base to identify attacks.
center. It plays fundamental role in contributing towards the
Clustering algorithms such as Expectation Maximization, K- development of organizations.
means were used to determine the number of clusters and filter Therefore, network devices generated information (log file) is
events based on filtering threshold respectively. On the other considered as a means to identify, detect, analyze and take a
hands, correlation finds a relationship or association among log remedy action accordingly. This enables administrators to
events and generate new attack definitions. Finally, we evaluated easily handle the monitoring activity of the entire data center
log analyzer prototype of the proposed system and obtained an
infrastructure with minimal data loss, time, effort and other
encouraging result with average precision of SOM#34 and AAU
expenses.
is 84.37 and 90.01 respectively. Further study and
implementation of log analysis can significantly enhance data Recently, the expansion of Internet leads many organizations to
center security of an enterprises. Generally, in this paper the be victim of various attacks type and create channel for easy
comparison of various log analysis techniques and our proposed dissemination across organizations’ data centers freely.
solution will be discussed. Security becomes great point of interest in which it is
accomplished through the process of log file analysis. In such
Keywords— Log File; Log Analysis; Layered approach; Attack
circumstances several research dimensions are conducted
Identification; Data Center; Network Security
towards the data center for guaranteeing security at the required
level. The works done so far can be taken as inputs and used to
1. INTRODUCTION bring a newly proposed solution aimed to enhance network
In fact a number of enterprises are more interconnected with security.
the advent networking technologies and large amounts of Log files are rich source of information and have been
information relayed on this infrastructures. With this respect, analyzed in the past for a variety of purposes and reasons, such
Network plays vital role for the establishment of as system maintenance, software testing and validation,
communication. However, security is left aside and becomes forensic analysis and for anomaly detection. The following
big issue for enterprises. The structure of network itself section will briefly discuss works which has been done related
provides suitable condition for intruders to create security to our work. Here we have categorized the works which are
events. Since attacks are unstable in their nature that lead not to done so far based on purpose for the usage of log files as
bring one size fit solution, in which organizations secure their discussed below.
network using their own means.
Log files are set of records collected from log generating 3. RELATED WORKS
devices. They are considered as ideal source of information for
security management [3, 4, 5]. By collecting and analyzing log 3.1 Log Analysis as a Security Aid
files, security professionals can determine loopholes in their
network and accordingly execute proactive mitigation In [6] the collected log files and use an information retrieval
strategies. In general, our work was aimed at making detail open source tools to index log files’ fields and search for
analysis to produce log analyzer having layered date center patterns of suspected behaviors, which may indicate intrusions.
security approach for organizations aligned with their security The aim of their work was to use the tool for indexing and
policies, procedures, and standards. searching for attack like patterns on log files. The application
In this paper, previously conducted researches in areas related indexes every occurrence of specific strings (e.g. denial of
to the concept of log analysis to build an enhanced data center access, wrong credential, and so on). After that, the system
tries to find events within similar occurrences comparing with base. Additionally, in their system finding association
the data searched for. Fuzzy searches have been used to detect (correlation) among log records is not considered for attack
same attack like patterns such as brute force attack. detection.
However, the application is very limited to analyze few In [8] develop Unsupervised Heterogeneous Anomaly
number of log files type with known log formats and patterns. Detection system (UHAD) which scrutinizes heterogeneous
Hence, multiple types of log need to be considered and also logs, without using a trained model on traffic behavior or
correlating such logs from myriad of recourses is necessary. knowledge of anomalies, and uses a two-step strategy in
And also, the system didn’t use a well prepared log data by clustering normal and abnormal events. They introduce new
incorporating a preprocessing task. An attack knowledge base algorithm for filtering, by which the filtering threshold is
which is dynamically updated must be constructed in order to calculated based on the volume of log events and the number
easily handle the analysis process. of log events clusters.
In [2] proposed an automated forensic diagnosis system to First, the component log preprocessing was used to extract the
reconstruct the attacks actions after a security incident has data from the logs with additional functions, such as isolator
been occurred. Their system analyzes a set of log files created and timestamp synchronizer. Secondly, the event clustering
by the different applications running in the network. The component separates abnormal events from the normal ones
system is composed of four modules: event collection, event using various logs and also finds possible number of clusters
pre-processing, event correlation, and attack graph generator; (K) to group the events using expectation maximization
all of them working on victim system log files to recreate in an algorithm. Thirdly, filtering clustered events component
automated fashion and come up with the attacker actions remove the normal events (noise) whilst retaining the abnormal
represented by attack scenario. First, event collection module events for further processing. Later on, aggregation of filtered
gathers the log files in their original format then, the events in events component combines the redundant events thereby
pre-processing module adjusted timestamp of log files, reducing the events in the filtered log. Then, transferring events
normalizes the attributes of the log files and saves them in a component extract the features from various aggregated logs as
repository (event container). Later, the event correlation phase stated in Generic Format (GF) to store in Generic Format Log
proceed by: first, atomic attack definitions from an attack (GFL). Finally, the system detects anomalous events by
knowledge base are used to find specific attack actions, and analyzing features such as IP address analysis and port number
second, the attack actions found are then correlated to build analysis.
attack scenario describing complex multi-steps attacks.
Finally, actions are represented through graphs to facilitate the However, the system lacks preparing log files through pre
interpretation to the end-user. processing to produce a better detection results. It also lacks
building attack knowledge base to extract atomic attack
However, the proposed system did not assume that the attack definitions for increased anomaly detection accuracy. In their
knowledge base will be outdated due to variable nature of system the concept of correlation among log events is not taken
attacks and attackers’ actions. And also, in preprocessing stage into consideration.
the log record may be incomplete for unknown reason or has
different log format but the system did not put any metrics to In [13] discusses a data mining tools Simple Log Clustering
clean a data. This can reduce the detection accuracy of the Tool (SLCT) and LogHound that were designed for assisting
system. In attack knowledge base port is left as an attribute system management to extract knowledge from event logs. The
which is important parameter like IP address to have a better automation of event log analysis is an important research
insight about intruders activity from log analysis. concept in network and system management. In order to tackle
3.2 Log Analysis using Data Mining such problem they propose a data mining technique to obtain
knowledge about events. SLCT is basically employs clustering
In [7] propose a system that parse/isolate logs from various algorithm for analyzing textual event logs where each log
sources and then cluster the logs using data mining tool record represents certain event. On the other hand, LogHound
(WEKA). The framework first collects unlabelled employs a frequent item set mining algorithm for discovering
heterogeneous logs, then parse each raw log individually and frequent patterns from event logs.
isolate log entries when necessary. Secondly, process of
clustering of log entries before filtering. Thirdly, again parse However, their research did not consider anomaly detection
the clustered logs to make it visible for filtering. Later on, the methods as part of their work and the effectiveness of
process of filtering proceeds to filter the clustered events. combing SLCT and LogHound, in order to build an event
Finally, the system combines the filtered events attribute log anomaly detection system. Another drawback is their
values which are exactly alike. proposed system did not include more preprocessing
techniques to produce efficient log analysis system.
However, the proposed work lacks to create common log
format through log normalization in a preprocessing module
for identification of log in its proprietary log format. In
addition, it is better to construct an updated Attack Knowledge
Base and compare each filtered event against the knowledge
3.3 Log Analyzer in real time data into pre-defined log format and store in the log central
In [44] real time log analyzer system was proposed begin with storage. And then, filter it according to the rules defined in the
collecting logs data from devices in the network into central rule module. Users can have flexibly to adjust the parameters
of filter and filter out the data in which the user may be in
server by removing garbage data and make correlation
interested from the many log information according to the
between them into one common table. Then, it converts them
actual needs.
into one common format. Customized learning algorithms such
as association rule, tf-idf, k¬means clustering, and decision
tree were used to analyze and interpret data to get important However, the proposed system is rule based in which
information from log data. Finally, the system converts again currently emerging attacks do not go along with the defined
into the graphical formats for easy understanding. The system rule which create security hole in the organization. The
uses adaptive learning algorithm to process the data stream and absence of structured and up-to-date attack knowledgebase
data model that changes along with time and to flush out the and absence of log correlation bring bottleneck for analysis of
old model when the model is too old. In general, the system log files. The preprocessing module task is not well known for
target was to detect the abnormal activities using combination making the system more robust. In addition, in their system
of signature-based and learning algorithms based techniques. the type of activities carried out in log analysis stage is not
Their proposed architecture of log analyzer system is shown well stated.
below in figure 1.
In [10] proposed an approach for receiving, storing and
Even though, the work come up with low false positive rate but administrating log events. They presented a secure audit log
still it lacks further refinement to process the log files collected management system focusing on security, flexibility,
with more algorithms to gain a better log analyzer. During performance, and portability. Furthermore, they come up with
conversion of log data to common format they left how to a design solution that allows organizations in distributed
extract important features from log file which have direct environments to send audit log transactions from different
relationship with detection accuracy of the log analyzer. The local networks to one centralized server in a secure way. They
implementation of tf-idf algorithm is less important for proposed system which has the capability to analyze logs from
detecting attacks for large log corpus since it results irrelevant different log sources such as, firewalls, IDS, servers, and
detection results. clients. The proposed system consists of one centralized server
located on a secured location connected to the inner parts of
the network of the supported system. To collect those log
events syslog and Simple Network Management Protocol
(SNMP) protocols are used. Then, the agents read local logs
and transfer the information with the syslog or/and SNMP
protocol to the log server. The security audit logs can also be
transmitted over the network to the system using standard
User Datagram Protocol (UDP), syslog protocol or
Transmission Control Protocol (TCP). In such way the system
collects log information from all types of clients, servers,
firewalls and network equipment. Moreover, the log server is
able to detect activation and deactivation of nodes and
network equipment on the supervised network, i.e. the
network where logs are collected from.
However, the proposed system uses statistical approach that In [15] propose a system used to analyze intersections of log
results less detection accuracy comparing to data mining files that come from different applications and firewalls
learning schema and limits correlation process to be less installed on one computer, and intersections resulting from log
effective to identify more intrusions/attacks in the IDS log. files coming from different computers. And also, it is
The proposed system uses approach for log data collection in concerned with the issues involving large scale log processing
online or offline mode in similar way but parameters must be which helps to analyze log records. They have used firewalls'
set for proper identification of the log data mode. In the log files coming from web server and from regular desktop
preprocessing component data cleaning is not included which computer (in both cases coming from the same period of
helps to obtain more features. time), and web server’s access-log file from the same time
period. During the initial preprocessing stage they have
In [12] a prototype system is developed and implemented removed from all logs entries that were related to intranet, and
based on relational algebra to build the chain of evidence. It is they have left only those entries that came from outside of
used to preprocess the real generated data from logs and their LAN. The log analyzer framework is shown in figure 2.
classify the suspicious user based on decision tree. The
proposed work describes the nature of the event information However, in the proposed system log preprocessing stage
and the extent to which it is correlated such event information entries are selected based on the source they originated (i.e.
despite its heterogeneous nature and origins. First, the system can be from intranet or outside LAN) which has no
begins by extracting log files of the web server and firewall importance to identify attacks in the network. The system did
and stored in the central location. In this stage, the data are not put any technique or systematic method for detecting
transformed in a suitable format for conducting effective attacks from log records.
3.5 Build Log Management Architecture the data center network and put in central repository (LFR) for
preprocessing which is second component of the system.
In [12] authors aim to suggest log management architecture The aim of the log file pre-processor is to adjust or prepare the
with more common functions that are used by vendors. They various input log files considering different metrics and feed
proposed log management architecture having collection to the processing unit called central engine. Log parsing, log
server which is the first module for collecting received logs cleaning, log normalization, and log aggregation are the major
from various log generator devices such as firewalls, NIDS, activities conducted in this component. Once this part is
operating systems, application systems, etc. Then, log completed, then the preprocessed log files are stored in NSIM
generators send logs by transmitting protocols like syslog, component and ready for later usage by central engine.
IDMEF, CEE, CEF and SNMP. Thus, collection server must
be able to understand all log formats. After studying various Network Security Information Manager is a module that
SIEM vendor architectures on log management the most works as a central repository for the entire raw of log data as
important functionalities are considered as follows: well as controls the integrity, accessibility and confidentiality
Normalization, filtering, reduction, rotation, time of those pre-processed log files received from log file
synchronization, aggregation and integrity check. Finally, preprocessing component. Attack knowledge base is part of
storage server keeps logs for forensic, auditing and off line the system which consists of atomic attack definition received
analysis. In addition, they consider log security in their from the central engine and provides required atomic attack to
architecture. the central engine.
However, the proposed architecture functionality is not The central engine component is the heart (core) of the system
evaluated and tested. And also the architecture did not include which is responsible to perform the overall log file processing.
log pre-processing and analysis component which are core in To achieve this the subcomponents include clustering and
log analysis. correlation or association module which comprise their own
components. On the other hand, an action center or a
In [16] proposes a defense in depth network security remediation section is a component that deals with providing
architecture and applies the data mining technologies to reasonable response towards emerging incident to the
analyze the alerts collected from distributed intrusion administrator through the user interface. It incorporates alert
detection and prevention systems (IDS/IPS). The key production, notification through SMS, generating report, and
component of the Global Policy Server (GPS) is the security visualizations. Finally, the audit reporter module is a
information management (SIM) which consists of an online repository for the generated information from action center
detecting phase and an offline training phase. The system component for long term usage. Therefore, our research is
consists of four main components in the online detecting mainly concerned with enhancing data center security trends
phase: First, the online data miner, which classifies the records of organizations through building log file analyzer (i.e.
in active database to detect attacks. Then, the rules tuner intended to identify and inform the state of the data center
which runs the machine learning algorithm tunes the within certain period of time). The following section will
parameters of rules accordingly. Later, the GLS, which briefly elaborate the concept related to our proposed
receives logs from LPSs stores them into an active database. architectural design for log files analysis using layered
Finally, policy dispatcher waits for the commands from the approach of data center security.
online miner.
However, the experimental results demonstrate the proposed
work is highly effective only for detecting the DDOS attacks
which is not for other attack. It also did not show if we take a
shorter time interval between the events it is difficult to
suggest about occurrence of the false alarm rate. Also, the
model only uses classifier as mining technique.
4. PROPOSED SYSTEM ARCHITECTURE
In fact, the development of system is determined based on the
composition of many subcomponents (parts) of the entire
system. Hence, the integration and interoperability of those
components will produce the expected system so that its
objective will be met.
Our model of log file analyzer with layer based data center
security consists of eight major components, including :- Log
Files Repository(LFR), Log File Pre-processor(LFPP),
Network Security Information Manager(NSIM), Attack
knowledge Base(ANB), Central Engine(CE), Action Center or
Remediation Area, and Audit Reporter as shown in figure 3.
Log file collector is simply concerned with gathering
heterogeneous log files generated by those leveled devices in
Figure 3: Proposed Log Analyzer System Architecture
5. RESULTS AND DISCUSSION security of data center. The system analyzes a set of
The clusters produced by clustering algorithms was calculated heterogeneous log files by collecting log data, preprocess
using weka experimenter with 10 fold cross validation and 10 them, build central engine for analysis and taking remedial
iterations to allow every part of the log record to be tested. measure through action center are the processes of the
True Positives (TP), True Negatives (TN), False Positives proposed system. The central engine module is heart of the
(FP) were measured to calculate the precision and False entire system performing log processing or analysis. It was
Positive Rate (FPR) of clusters. Precision is calculated by used for finding security holes in bidirectional fashion using
dividing true identified log by total number of clustered logs both clustering and correlation techniques. In order to validate
(precision=TP/Total number of clustered log events). And the usability of our system we used a real network based log
False Positive Rate is calculated by dividing incorrectly records of both SOTM#34 and AAU data center devices.
identified log events to a given cluster by the sum of Based on those we found several attack actions which leads us
incorrectly identified log events to a given cluster and for construction of an attack scenarios. In general, the
correctly clustered log events into other cluster ( FPR = FP / following are some of potential future works for continuation
(FP + TN)). The evaluation of clustered log events for of our work.
SOTM#34 and AAU is summarized below in Table 1 and Apply more log analysis or mining approaches to
Table 2 respectively. obtain useful knowledge and reduce false positive
and false negative results.
Table 1 Evaluation of Clustered Log Events for SOTM#34 Take more log records to make the work generic.
Log cluste Evaluation Understands users’ behavior from analyzed logs.
File red Create intelligent attack knowledge base for the sake
log TP FP TN Precisio FPR of forensics, auditing and others.
a.
record n (%) (%)
s
ACKNOWLEDGMENT
access 3,554 3,281 73 97 92.31 42.94
_log I would like to express my deepest gratitude to my co-author
error 3,692 3450 192 44 93.44 81.35 Dr. Dejene Ejigu in which the work would not be possible
_log without his motivation, enthusiasm, continuous as well as
ssl_err 374 298 76 21 79.6 78.35 constructive supervision, and encouragements. I want to
or_log extend my appreciation to Mr. Zelalem Assefa manager of
Iptable 179, 168,269 1103 110 93.61 90.91 EthERNet and other members for providing necessary
syslog 752 4 2 information related to research. Finally, thanks to my
colleagues, families, teachers, friends and others who have
Snorts 69,039 58,480 1055 527 84.70 95.24 contributed in one or another ways for successful
yslog 9 accomplishment of this work.