17-2018-Review of Various Methods For Phishing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

EAI Endorsed Transactions

on Energy Web and Information Technologies Research Article

Review of Various Methods for Phishing Detection


R.Sakunthala jenni1,*, S.Shankar2
1
Research Scholar, Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology,
Coimbatore, India.
2
Professor, Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology,
Coimbatore, India.

Abstract
In this modern world where the technology has spread rapidly and the inception of cell phones, computers and also the rate
at which internet is being used has increased in all fields both commercial, financial and also individuals. The above said
inception is a boon but the users are facing dangerous challenges. Hence phishing and information pilfering which are done
through spams and deceptive emails. This kind of spams and deceptive emails could lead to great losses for institutions like
financial and similar ones. It is understood that in the beginning it is very difficult to judge or trace the modesoprandi of
these hackers. To ascertain this cryptic methods of phishing attacks can be effectively done only by developing a particular
software thereby safe guard the users. To pick out and detect the operation methods of hackers, the researcher uses the Data
Mining process where a number of datamining tools, which analysis the data are used. This learning, basically is Data
Miming process and the informations are taken out through different outlets and sources.
Keywords:phishing, anti-phishing, hackers, fraud websites, legitimate websites.
Received on 31 May 2018, accepted on 10 September 2018, published on 12 September 2018

Copyright © 2018 R.Sakunthala jenni et al., licensed to EAI. This is an open access article distributed under the terms of the Creative
Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction
in any medium so long as the original work is properly cited.

3rd International Conference on Green, Intelligent Computing and Communication Systems - ICGICCS 2018, 18.5 - 19.5.2018,
Hindusthan College of Engineering and Technology, India

doi: 10.4108/eai.12-9-2018.155746 security systems. Pilfering the details of an individual person


or institution through emails is a dangerous aspect faced by a
1. Introduction person or institution. The hacker’s main intention is to steal
the users information and use it in the wrong way. In this the
Millions of people through out our globe, have made their tool spam is used by which they pose themselves as genuine
lives dispensable without the use of various kind of smart to attract the people towards huge amount of lottery money
phones. They have the technique of accessing into any or other fascinating attraction to divert and attract the users
Iinnumerable facilities. Furthermore Business houses, there by they can get access to their personal information’s
financial institutions, banks other facilities like online like Account numbers and financial information, where we
shopping, give extensive fine services through their websites. deeply analyses and we learn that certain set patterns are
Transactions and businesses done through websites and fabricated and followed when they write emails and they
cyberspace reduces time wastages, less traffic and the cost stress on words like winner, lottery, visa etc., Certain data and
effective is better. So website business transactions have information from the hackers and emails, which they send to
become very popular. But inspite of all these excellent users are useful and helpful to find out and trace them and this
benefits, we learn that there are dangerous repercussions the in turn helps to develop security tools. To extract and find out
users face, by the hackers who deceive through their phishing these deceptive patterns in large amounts, the best method is
activities in order to pilfer valuable informations [1]. The Data Mining. Data Mining has been researched and it has
hacker’s intentions are to rob their money, or some of them various operation procedure to trace and discover and acquire
do it for revenge and a few other hackers do just for fun and useful information like cluster and classification, Artificial
thrill. They are in, for destructive activities. These unethical network, Bayesian network, Decision tree and Machine
practioners are called hackers, crackers, and intruders [2]. learning [3]. The main purpose of this paper is to pinpoint and
The most important operational part in a computer is the clarify the concepts of phishing attacks and moreover
*
Corresponding author. Email:[email protected]

EAI Endorsed Transactions on


1 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
R.Sakunthala jenni, S.Shankar

LbgggxhhFigure.1 Contents and the source Code as Features for Detecting Phishing Attacks.[30]

understand the usual practice of Datamining. Next step is to aim is to catch a fish, here in phishing we trace and catch the
elaborate on the different mining techniques to detect and hackers. They use spoofed websites to catch the user’s
clarify the different phishing attacks. In the end, conclusions personal information which in turn the hackers use for their
are realized after the study is presented. benefits [4]. Approaches like email deceive the users. They
personalize their new business house and institution or
trustworthy persons to invite the attention or attract the users,
2. Phishing get them in to spill out their personal details and they finally
become a victim. To put it in to a nutshell they get trapped in
The word is derived from the origin in trapping and catching some of the links, later on they realized as not genuine but
a fish-‘fishing’- normally such terms are not popularly illegal [5]. |The first step to trace out the phishers, is to trace
located in the computer science , for the sake of security out their modesoprandi in their internet links. The important
operations which are carried out on social engineering. features for tracing and detecting attacks include the time, that
is, when, in other words they ape the site’s name, IP
addresses, some errors present which are fishy, the type @
2.1. Phishing Attacks
character and so on. Nevertheless this above said operation is
not easy due to the vast size of information, which is
Figure 1 shows an example Hence by one of these connection
disorganized, sometime hidden and camouflaged low level
in the internet spam, the user is instructed to visit a
information, which creates confusion in understanding their
illegitimate webpage which has resemblance to the login page
technologies. So to find out we could use counteraction
of the well-known Amazon webpage. In this source code the
effectively and always attack their weakness in the
tags are utilized as a vital feature for tracing the Phishing
cyberspace which they use, to do their stealing operation.
attacks. It is established that we find no actual authentic code
Clicking on links is one of the tactics the hackers follow, so
for the pages foot notes, inside the source code of the page. It
that they have access to the user’s page [6]. Till now phishing
is noted that there are no authentic connection and this in turn
is not totally defined because of their varied approaches, so
helps to detect the Phishing Attack.
the definitions are many. In 2012 papers, the different
This method, is adopted for security in the internet
definitions of phishing is shown and this is presented for our
has taken a vital role in recent times. It indicates to the method
understanding. Papers are extracted from the work of Phish
where hackers illegal methods use, to send across emails
Tank company (http://www.phish-tank.com/): [7].
(spams) chat, through information of the online users like
Phishing attacks are done by taking identity and legitimacy of
their ID, phone number, account number and numbers. The
a person or institution which are genuine. For example, the
methods of phishing has many approaches, using spam, and
hackers may come across different types of websites which
it varies each day and here the toll of phishing multiplies. In
are used by the shoppers, and they forward details through
spite of the simplicity of this method, the attacks are very
messages to them to acquire personal details like password,
destructive and absolutely effective and this is supposed to be
account number etc. They do it in a very professional way so
the most dangerous threats on security, in the online
that users will not have any suspicion. Figure 2 shows the
operations. As we termed earlier the word ‘Phishing’ is like
peak time of Phishing Attacks the second half of the year
‘Fishing’ only the spelling differs. In the actual fishing our

EAI Endorsed Transactions on


2 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
Review of Various Methods for Phishing Detection

2014 which is based on the domain. According to the figure


the peak time for the info domain it shown clearly [8].

Figure. 2 Shows the peak time of Phishing Attacks in the Second Half of 2014 Based on Domains

2.2 Phishing Attacks: Its Range &Competence Phishing Life Cycle


The range and scope which the phishing attacks posses, The nature of phishing attacks are elaborate. As we have life
depends on the type of the domain. Varied top level domains cycle for many things phishing attack also has a start, develop
[TDLs] have varied methods of attacks. and finishing stages.
Phishing attacks figure3 starts with the users finding of emails
 Conflicting phishing attacks of hackers very genuine. The hackers send out emails which
will be absolutely like a genuine one and it will be so
Various techniques are there to encounter interesting that the users are carried away, get into the hackers
phishing attack which is explained herewith [8]. email and expose their personal information and all possible
1. Training the internet users (victims) for data. Almost 65% of phishing attacks begin by the visit of
encountering phishing attacks. links received within an email [9]. As per the figure there are
2. Training the users to type messages rather than click three stages. The first one is the early phishing stages which
on the links. is explained above, then it leads to the mid phishing and ends
3. Instructing the users not to encourage emails with with the post phishing. In the second stage the users are
the public without specific names. carried away by this unauthentic links and open and disclose
4. Make the users understand that institutions and details. In the post age the defrauding becomes a success. In
banks linked with users, generally, who do not call this process the users reveal their private information’s which
for their account number. the hackers need.
5. The users should install anti-phishing extraction on
the browsers. 3. Solution

 Stimulus and aim of phishing There are 5 methods to solve the problem [10]
(1) The first one is identifying the needed data. We need a set
[15] The below explains the stimulus and aim of details which are already identified. These should have
behind the attacks. certain influences on the out i.e. the classifier. So the set of
output and input should be detected.
1. The main aim and purpose is to get financial gains. (2) Phishing data has many sources for example the Phish
The desperation to get money induces the hackers to Tank. Phish Tank has pairs of input instances and the derived
pilfer and steal from individuals, banks and institution. destination class.
2. They operate this by hiding their identity so that they (3) Determining the input features. It all depends on how
can carry out their destructive actions. For this they use carefully the features are selected. In this unwanted, irrelevant
stolen user’s name and passwords –for example while and unconnected are discarded so that the magnitude of the
shopping through internet, gamming, product sales and training data set is minimized. This in turn helps in the
even child abuse is done. learning process and execution of the process too.

EAI Endorsed Transactions on


3 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
R.Sakunthala jenni, S.Shankar

(4) The most vital and important step to be taken is the choice
of a mining algorithm. We have vast rang in the mining
processes, in the literature and each of this process has its

Figure. 3. Phishing life cycle.[16]

3.1.1 Awareness for the user


advantages and disadvantages. There are three important
elements for selecting these classifications. They are Creating awareness for the users is also equally important.
1.The input Data characteristics They have to be educated about these criminal, their way of
2.Te accuracy rate measure of the classifier predictive power operation etc., [12]. The internet users should be convinced
3. The clarity and the understandability of the output and also learn how to trace, through the security indicators
(5) Finally it is the evaluation of the process. That is the within the website, the problem will be reduced to a large
derived classifier performance of the data. extent because it is two parties tracking them. The main
advantage for the hackers is the ignorance of users. They lack
We cannot pinpoint and say that any particular classifier is the basic knowledge, and not aware that such dangerous
the best because its performance over all depends on the activities are done by hackers. Nevertheless it is on total, a
characteristics of the training data and AC is selected because difficult task to implement. Reason behind this is the users
if its many outstanding features, like prognostic approach and need substantial time and knowledge, and become more
meticulousness and also the quality of result which acquired. clever and talented in learning phishing techniques, which
To minimize the phishing there are two methods.one is the even experts in this field sometimes fail and over look.
technical method and the other one is Nontechnical method.
3.2 Technical Method
3.1 Non Technical Method
There are two prime approaches in this technical approaches.
Legal solution to the problem is the non-technical method. One is the blacklist technique and the other heuristic based
This method is adopted in many countries, and under this the [15]. According to the method of operation of the blacklist
US was the first to bring the law against phishing activities the URL seems to have comparison with a pre-established
and when the law was enforced lot of hackers were caught, phishing URL. In this approach it does not deal with all
arrested punished and sued for the illegal activities. The law phishing websites because there are inflow of new fake
was brought phishing under computer crime in the year 2004 websites consumes considerable amount of time before we
by the Federal Trade Communication (FTC). It is an agency could establish and add to existing list. Whereas in the
started in the interest of the consumers, for their protection heuristic-based technique, one could fast recognize the newly
and safety [13]. Later Australian and UK enforced this law, launched fake websites without considerable wastage of time
in the year 2005-6. Their law enforced and prohibition of [14]. Because of the certain disadvantages we saw, the above
fraud activities, i.e. phishing websites and also ordered technique, and a need has arised to explore and innovate new
imprisonment of the hackers (http//www.finextra.com.news). and different approaches.
Eventually countries like Australia agreed upon and signed Among them MacAfee has brought about new solutions.
the papers, with the Microsoft Company. The law was Furthermore non profit organizations like APWG [15] Phish
brought the personal who were the trainers, were given the Tank (Phish Tank 2006) has brought in better practice and
knowledge as how to phish the hackers and learn their methods, to help the users against the phishers and help the
modesoprandi [12]. But, this legal approaches was not 100% users gain more experiences recognition of phishing websites
successful in catching the phishers, mainly because it is the and within shorter periods leads to the success of anti-
task to trace the hackers because of them disappearing phishing techniques, accurate decisions are not derived due to
(leaving their track) from the cyber world, quickly. fluctuations in the pulse in the increase of deceptive assured

EAI Endorsed Transactions on


4 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
Review of Various Methods for Phishing Detection

conclusion, to put it in to a nutshell branding is an authentic activity was the research with reliable on the whitelist which
website , most of the time is not original. can be seen in phish 200 [19]. Here phish200 creates profiles
Some technical solutions realized by researchers and scholars of genuine and trusted websites which is based on Fuzzy
for dealing can be seen herewith. hashing techniques. What we mean by website profiles is, it
is an amalgamation of several metrics. That can distinctly
3.2.1 Blacklist-Whitelist Techniques pick out that website. In this process it combines whitelist
with blacklist and as we learnt that the heuristic approach is a
Blacklist as a name suggest is considered to be baleful and warning process for the users abutt the hackers. The
which has been gathered through techniques like users vote. researcher’s belief that the detection technique should be
So whenever a new website is established, the browser guides derived from the users view point as because 90% of users
us to check whether the new website comes under a blacklist. depends on how the website looks and so that the genuine of
If it is under the blacklist the browser warns the users to stop the website could be verified.
sending personal informations like ID, bank A/c no: etc. It is
noteworthy that blacklist can be recorded in the user’s 3.2.2 Fuzzy rule approaches
computer or optionally on a requested server, by the phishers
as and when where is the URL request. According to [17] Here the technical approach followed on [20] the basis of
blacklists are noted and established at different frequencies. contradicting some rules on the basis of algorithms, this is
The estimation was 50-80% of phishing URL’s which are done after gathering different kinds of features which varies
displayed in the blacklist 12 hours after their launch besides in features, and the capacity of website as depicted in table.
other black lists through Google’s need on an average of 7 There will be three uncertain values. They are the
hours for update [18]. So it is inferred and understood that a “legitimate”, “genuine”, “doubtful”. After a series of
black list updated then and there in the interest of the safety experiments the authors evaluate it using the below
of users and that they do not become victims of the blacklist mentioned algorithms in Weka, PRIMS, C4.5, JRIP and Part
hackers. The blacklist approach is embattled with respect to [21, 22] from the result they established a very distinguished
various solutions one of the important is harmless browsing connection of both “URL” features and “Domain Identity”.
in google. In this file or predefined phishers are used-URLs Nevertheless they could not assess any justification on the
to trace out fraudulent URLs. A different technique that is features. Larger set features were used by authors of
followed for the protection of Microsoft IE9 which works (Alburrous, Hossain Dahal and Thabtah) to foresee websites
against phishing technique and also protect site advisor which type based on fuzzy logic. Their developed method, in spite
are actually data based solutions and these are created to of giving good results, in accuracy, it is not clear as to how
detect and catch illegitimate attacks like Trojan horses and they established, extracting from the website and specific
Spyware. These have crawler which is automatically operate features associated with human factors. Everything was done
and help to browse the website and also establish threat and and arrived at a conclusion based on human experience rather
rate the range of threat which is connected with the already than intelligent data mining techniques. This paper has been
entered URL. Nevertheless, site advisers cannot locate or taken with the intension of solving the above said problems.
identity newly created dangers. The third one is the anti- The authors divided the websites under different categories
phishing tool called Verisign traces numerous websites which very legitimate, legitimate, suspicious phishy or very phishy.
can recognize “clones” so that one could find out the illegal The fine line for these different legitimate categories were not
websites. There is always a competition between the attackers earlier established.
and users, so approaches are not fool proof. There is a
technique tiled Net-craft which is comparatively small 3.2.3 Machine learning techniques
process that activates on a web browser. It depends on totally
illegitimate website which comes under the blacklist which in The different types of methods adopted, progress and
turn is recognized by Net-craft and also which is induced by established in order to control phishing with done by the
the users and this is verified by Net-craft. Net-craft clearly support vector machine (SVM). This SVM is a popular
shows the location of the server where the webpage is hosted. machine through which training is given to the users, and is
Users who are experience accept that Net-craft is very useful used to solve classification problems effectively [23]. It
to the operation to site an example webpage which has became popular because of its ability to bring forth accurate
“ac.uk”. it is not done other than UK. results from un structural problems like text categorization. It
Whitelist is not similar to blacklist and as the term explains is realized that it is realized that it is possible to visualize
they are genuine websites. All others cannot be called as an SVM equal to a hyper plane which has the ability to split the
automated- individual-whitelist (AIWL). This AIWL is a tool object (points) and which belong to the group (negative
which works against phishing, where the user’s whitelist is objects) by the SVM algorithm while learning, where the
used as the basic, inside a genuine, trustworthy websites. This hyper-plain is got so as to which in turn can divide positive
AIWL, very efficiently can trace all log- in entered by the user and negative objects with maximum level. This level depicts
by using Naïve Bays algorithms. When repeated successful the area between the hyper plane to the nearest positive and
log on is done in the case of the users; the specific website is negative object proposed new technique which had the help
received. AIWI’s work is to induce to connect the website to of SVM. The purpose of this discovery of authentic and
the whitelist by the users. Yet another way to correct the mad unusual (suspicious) operations, example is phishing, through

EAI Endorsed Transactions on


5 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
R.Sakunthala jenni, S.Shankar

the homepage under the company's name which seen in the were legitimate and the other half were illegitimate and 8
domain name. The next one is titled as “Page categorizer” features. The features are suspicious link, domain age IF-IDF
which shows the characters connected structural features and so on. While executing the experiments some changes in
(unusual URL, unusual DNS record, etc.,) which is difficult the performance is noticed which is as follows. They are
to copy as duplicates. explained below.
There are 6 different architectural properties which has been 1. To begin, finding the genuine website, a filter was
selected and Vapkin’s Support Vector Machine (SVM) set because the attracting fake sites for dragging the
algorithm [25] was employed for establishing to know if we user’s to the fake website which could cause lot of
can find it is a legitimate or not. Later it was established in harm for the users information we should have the
that the “Identity Extractor” has an important characters in screen to resists with in the ‘site’.
union with illegitimate URLs. This information was arrived 2. According to the version of the researcher both
after some limited data lets were done which consisted 179 features like ‘Domain age’ and ‘Known image’
URLs. From this an 84% accuracy was inferred by using these are not very important.
other features, a solution will be arrived to make this accuracy 3. Thirdly they researched and established a new type
more precise. of fizzy webpage and primarily on the top of the
A comparative study was done, on the problems of email domain.
phishing, by using machine learning techniques, which
included SVM, decision tree, and naïve bays, by [26]. A 3.2.5. Associative Classification Data Mining Technique
research work done on a random forest algorithm titled Neda A et al [16] studied the website phishing by using Multi-
“Phishing Identification by learning on features of email label Classifier based Associative Classification (MCAC)
received” (PILFER) in a unsystematic way. The experiment method was used to identify the phishing websites with
was done it 860 illegitimate emails and 695 legitimate accuracy. The new rule was generated for enhancement using
website. It is noted that PILFER has sharp accuracy to detect MCAC. The websites doesn’t considered which has content
illegitimate emails. IP based URLs were used in a few based features. The MCAC technique will cannot create by
features in order to detect illegitimate emails. This has previous algorithms.
connection between the emails and also number of MCAC method works by below mentioned Points.
connection inside the email, and domains which appear in the 1. Looking for the hidden relationship between the
email, number of spots inside the connection and the contents class attribute and the attribute values training.
of java scripts and spam filter output. It was concluded by the 2. To form the association rules by using this
author that is possible for PILFER inclined on to the relationship.
classification of can be boosted towards the classification of 3. Based on the support and confidence, the rules may
emails. In the process of combination varied ten features be sort by using the sorting algorithm.
“Spam filter output”. To assess the authors used the same data 4. Proper rules are accepted, ignored the duplicate
set. From the result it was inferred that PILFER it was rules.
decreased the rate of false positive.
Various features was identified and discussed previous
related to phishy and legitimate websites and collected over
3.2.4 The Cantina Technique
1350 various websites from different sources. Some features
was having categorical values like as phishy, suspicious and
This is a technique reached by [27] where they used
legitimate. These kind of values was replaced -1, 0, 1
“Carnegie Mellon Anti-phishing and Network Analysis
respectively. Usually the website can be divided into two
Tool” (CANTINA). From this method the type of websites
different classes like, legitimate and phishy. The AC method
which used frequency –inverse –document –frequency is
was used in this study can discover the rules with one class
established.TF-IDF [28] Cantina checks the website and what
and two classes also (legitimate and phishy). MCAC method
it has and then arrive at a decision as to know the nature of
can make new kind of rules through new class which earlier
website is phishy which used TF-IDF. These analyses the
not seen in the database in the name of “suspicious”. If
importance of weight and also the importance of analyzing
websites are considered as suspicious, it may be either
with frequency. For a given webpage, CANTINA calculates
legitimate or phishy. The end-user can give an accurate
the TF-IDI, the next step is to taken TF-IDF which are higher
solution based on the assigned weights to the data.
than other and this is added to the URL to acquire the lexical
Various algorithms was used to evaluate the efficiency and
signature. This is equally entered in to a search engine. A
applicably in specific MCAC method on the collected data.
Legitimate is that which is among the current first 30 results
The major method was used in this studied besides are MCAC
otherwise it is called phishy. If the result is zero after the
(CBA,MCAR and MMAC) and (C4.5, PART,RIPPER).
search it is established as phishy. To solve the problem the
researcher use the method where the combination of TF-IDF
with different character, (mistrustful URL, life of domain,
There are limitations in this technique and it is because some
legitimate websites uses like TF-IDF such as could be unfit.
One more technique with uses with additional attributes [29]
have use of data set which has 200 websites, half of them

EAI Endorsed Transactions on


6 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
Review of Various Methods for Phishing Detection

Table 1. Comparative and analysing study

S.no Author Paper Title Description of the work Result


1 Sheng, S., An empirical The effectiveness of phishing black- According to this paper blacklists are noted
et al. analysis of lists were studied. and established at different frequencies.
2009 phishing In this study 191 new phish used that The estimation was 50-80% of phishing
blacklists. were lesser than 30 minutes old to URL’s which are displayed in the blacklist
run two tests on 8 anti-phishing 12 hours after their launch besides other
toolbars. black lists through Google’s need on an
average of 7 hours for update.
2 Afroz, et PhishZoo: Here phishZoo creates profiles of PhishZoo provides accuracy of 96%like
al. 2011 Detecting genuine and trusted websites which Blacklist. By using this approach can
phishing is based on Fuzzy hashing classify the zero-day phishing attack.
websites by techniques.
looking at
them.
3 Aburrous, Predicting The technical approach followed on The results indicating Associative
et al. phishing the basis of contradicting some rules classification algorithm of MCAR showed
(2010a). websites on the basis of algorithms, this is less error rate (12.622%) when compared to
using done after gathering different kinds other traditional classification. The error
classification of features which varies in features, rate was measured based on accuracy and
mining and the capacity of website as speed.
techniques. depicted in table. There will be three
uncertain values. They are the
“legitimate”, “genuine”, “doubtful”.
4 Sadeh, et Learning to The research work done on a random It is noted that PILFER has sharp accuracy
al, (2007) detect forest algorithm titled “Phishing to detect illegitimate emails. IP based URLs
phishing Identification by learning on features were used in a few features in order to
emails. of email received” (PILFER) in an detect illegitimate emails. From the result it
unsystematic way. was inferred that PILFER it was decreased
the rate of false positive.
5 Thabtah, Naïve Cantina checks the website and what To solve the problem the researcher use the
et al, Bayesian it has and then arrive at a decision as method where the combination of TF-IDF
(2009). based on chi to know the nature of website is with different character, (mistrustful URL,
square to phishy which used TF-IDF. These life of domain,
categorize analyses the importance of weight
Arabic data. and also the importance of analyzing
with frequency.
6 Neda A, Phishing Website phishing by using Multi- MCAC method can make new kind of rules
et al, detection label Classifier based Associative through new class which earlier not seen in
(2014). based Classification (MCAC) method was the database in the name of “suspicious”. If
Associative used to identify the phishing websites are considered as suspicious, it
Classification websites with accuracy. The new may be either legitimate or phishy. The
data mining rule was generated for enhancement end-user can give an accurate solution
using MCAC. based on the assigned weights to the data.

catch phishing attacks. In this research paper we establish


Conclusion different kinds of Data mining technologies. Such as Black
and Whitelist approach, Fuzzy approach, Cantina approach.
Unauthentic websites or emails manipulate and get valuable Every technology that was followed as a few pros and cons.
private information of the users. This illegal attempt is known Our survey paper has a foresight visionary of phishing
as phishing. Phishing attacks have many kind of methods. website and different technology of phishing attacks. From
That are spam/email, web based delivery, instant messaging, this review report clearly understood that one type of
Trojan hosts, system reconfiguration and so on. So, some technology is not sufficient to detect phishing attacks and it
techniques are developed to chase phishing websites, to also names of providing high authority in phishing detection.
detect and arresting the activity. Especially a method called In detecting challenge of phishing activity with high
Data mining which have different anti phishing methods to accuracy. Our work in future has decided to take a challenge

EAI Endorsed Transactions on


7 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
R.Sakunthala jenni, S.Shankar

and establish better effective approaches for minimizing this


phishing activity.

[14] Miyamoto, D., Hazeyama, H., & Kadobayashi, Y. (2008).


An evaluation of machine learning-based methods for
References detection of phishing sites. Australian Journal of Intelligent
Information Processing Systems, 2, 54–63.
[1] G. Aaron, "The state of phishing," Computer Fraud &
Security, vol. 2010, pp. 5-8, 2010. [15] Aaron, G., & Manning, R. (2012). APWG phishing reports.
<http://www.antiphishing.org/resources/apwg
[2] E. Shein, "The gods of phishing," Infosecurity, vol. 8, pp. -reports/>.
28-31, 2011. 548(11)70023-7
[16] N. Abdelhamid, A. Ayesh, and F. Thabtah, "Phishing
[3] S.-H. Liao, P.-H. Chu, and P.-Y. Hsiao, "Data mining detection based Associative Classification data mining,"
techniques and applications–A decade review from 2000 to
Expert Systems with Applications, vol. 41, pp. 5948-5959,
2011," Expert Systems with Applications, vol. 39, pp.
2014.
11303-11311, 2012. http://dx.doi.org/10.1016/j.eswa.2014.03.019
http://dx.doi.org/10.1016/j.eswa.2012.02.063
[17] Sheng, S., Wardman, B., Warner, G., Cranor, L. F., Hong,
[4] M. Aburrous, M. A. Hossain, K. Dahal, and F. Thabtah,
J., & Zhang, C. (2009). An empirical analysis of phishing
"Intelligent phishing detection ystem for e-banking using blacklists. In CEAS.
fuzzy data mining," Expert systems with applications, vol.
37, pp. 7913-7921, 2010.
[18] Dede(2011).<http://blog.sucuri.net/tag/blacklisted>
http://dx.doi.org/10.1016/j.eswa.2010.04.044
(accessed June 25, 2013).
[5] T. A. Almeida and A. Yamakami, "Facing the spammers: A [19] Afroz, S., & Greenstadt, R. (2011). PhishZoo: Detecting
very effective approach to avoid junk e-mails," Expert
phishing websites by looking at them. In Proceedings of the
Systems with Applications, vol. 39, pp. 6557-6561, 2012.
2011 IEEE fifth international conference on semantic
http://dx.doi.org/10.1016/j.eswa.2011.12.049 computing (ICSC ’11) (pp. 368–375). Washington, DC,
USA: IEEE Computer Society.
[6] C. Konradt, A. Schilling, and B. Werners, "Phishing: An
economic analysis of cybercrime perpetrators," Computers
[20] Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F.
& Security, vol. 58, pp. 39-46, 2016. (2010a). Predicting phishing websites using classification
http://dx.doi.org/10.1016/j.cose.2015.12.001 mining techniques. In Seventh international conference on
information technology; 2010 (pp. 176–181). Las Vegas,
[7] M. Khonji, Y. Iraqi, and A. Jones, "Phishing detection: a
Nevada, USA: IEEE.
literature survey," Communications Surveys & Tutorials,
IEEE, vol. 15, pp. 2091-2121, 2013. [21] WEKA (2011). Data Mining Software in Java. Retrieved
http://dx.doi.org/10.1109/SURV.2013.032213.00009 December 15, 2010 from
<http://www.cs.waikato.ac.nz/ml/weka>.
[8] P. Kumaraguru, P. Dewan, and R. Clayton, "2014 APWG
Symposium on Electronic Crime Research (eCrime)." [22] Witten, I., & Frank, E. (2002). Data mining: Practical
machine learning tools and techniques with Java
[9] KasperskyLab(2013).
implementations. San Francisco: Morgan Kaufmann.
<http://www.kaspersky.com/about/news/spam/2013/>.
[23] Song, M. (2009). Handbook of research on text and web
[10] Horng, S. J., Fan, P., Khan, M. K., Run, R. S., Lai, J. L., & mining technologies. Information science reference, IGI
Chen, R. J. (2011). An efficient phishing webpage detector.
global.
Expert Systems with Applications: An International
Journal, 38(10), 12018–12027. [24] Pan, Y., & Ding, X. (2006). Anomaly based web phishing
page detection. In ACSAC ’06: Proceedings of the 22nd
[11] W. D. Yu, S. Nargundkar, and N. Tiruthani, "A phishing
annual computer security applications conference (pp. 381–
vulnerability analysis of web based systems," in Computers
392). Washington, DC: IEEE
and Communications, 2008. ISCC 2008. IEEE Symposium
on, 2008, pp. 326-331. [25] Cortes, C., & Vapnik, V. (1995). Support-vector networks.
http://dx.doi.org/10.1109/iscc.2008.4625681
Machine Learning, 20(3), 273–297.
[12] The Government of Australia (2011). Hackers, Fraudsters [26] Sadeh, N., Tomasic, A., & Fette, I. (2007). Learning to
and Botnets: Tackling the problem of cyber crime. Report detect phishing emails. In Proceedings of the 16th
on Inquiry into Cyber Crime.
international conference on world wide web (pp. 649–656).
[13] Kunz, M., & Wilson, P. (2004). Computer crime and [27] Guang, X., Jason, O., Carolyn, P. R., & Lorrie, C. (2011).
computer fraud report. Submitted to the Montgomery CANTINA+: A feature-rich machine learning framework
County Criminal Justice Coordinating Commission.
for detecting phishing web sites. In ACM transactions on
information and system security (pp. 1–28).

EAI Endorsed Transactions on


8 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16
Review of Various Methods for Phishing Detection

[28] Thabtah, F., Eljinini, M., Zamzeer, M., & Hadi, W. (2009).
Naïve Bayesian based on chi square to categorize Arabic
data. In Proceedings of the 11th international business
information management association conference (IBIMA)
conference on innovation and knowledge management in
twin track economies (pp. 930–935).

[29] Sanglerdsinlapachai, N., & Rungsawang, A. (2010). Using


domain top-page similarity feature in machine learning-
based web. In Third international conference on knowledge
discovery and data mining; 2010 (pp. 187–190).
Washington, DC: IEEE.

[30] Marjan Abdeyadan, Ali Rayat Pishes (2016) Detecting


Internet Phishing Attacks Using Data Mining Methods of
the 3rd International Conference on Innovative Engineering
Technologies (ICIET’2016)August 5-6, Bangkok
(Thailand).

EAI Endorsed Transactions on


9 Energy Web and Information Technologies
07 2018 - 09 2018 | Volume 5 | Issue 20 | e16

You might also like