43.v. Bharanipriya1 & v. Kamakshi Prasad2
43.v. Bharanipriya1 & v. Kamakshi Prasad2
43.v. Bharanipriya1 & v. Kamakshi Prasad2
Web mining adopts data mining techniques to automatically discover and retrieve information from web documents and
services. In this Paper we have discussed the concepts of Web mining. We have mainly focused on one of the categories of
Web mining namely Web Content Mining and its various tasks. We propose a six step Web content mining process in our
work. Various tools for Web content mining are also discussed and their relative merits and demerits are presented.
Keywords: Web Mining, Web Content Mining, Web, Tools, Moulding
1. INTRODUCTION
Analysis and discovery of useful information from World
Wide Web poses a phenomenal challenge to the researchers
in this area. Such a phenomena of retrieving valuable
information by adopting data mining techniques is called
Web mining. Web mining is classified into following five
sub tasks: 1) Resource finding, 2) Information selection and
pre-processing, 3) Generalization, 4) Analysis and 5)
Visualization [1]. Web mining is divided into three
categories: Web content mining (WCM), web usage mining
(WUM) and web structural mining (WSM).
Email: [email protected]
212
QI2 0
QI3 0
QI4 0
Creating automation tasks takes few minutesrecord keyboard and mouse strokes, or use easy
point-and-click wizards.
213
214
Tasks
tool
Reco
Extract
Extracts
User
rds
structured
unstructured
friendly
the
data
data
data
Automation
Anywhere
Extractor
Web
Info
Web Content
Extractor
Not for
X
unstructu
red
ScreenScraper
Mozenda
Features:
Commonalities:
Differences:
6. CONCLUSIONS
There are many concepts regarding World Wide Web. We
tried to expose Web content mining, one of the categories
of Web mining. The term Web Content mining refers to a
technique that encompasses broad range of issues. We
provided different views to understand Web content mining,
and given six tasks of WCM. We explored more on some of
the tools of Web Content Mining and provided their
comparisons and differences. We observed that Screen-
215
REFERENCES
[1]
[2]
[3]
[4]
www.Mozenda.com
[5]
www.screen-scraper.com
[6]
[7]