Proceedings of the second Asia-Pacific Conference on Intelligent Agent Technology (IAT-2001), Pages 294--299, Maebashi Terrsa,
Maebashi City, Japan, October, 2001
PRICE WATCHER AGENT FOR E-COMMERCE
SIMON FONG
E-Netique Pte Ltd, Singapore
E-mail:
[email protected]
AIXIN SUN
School of Computer Engineering, Nanyang Technological University, Singapore
E-mail:
[email protected]
KIN KEONG WONG
School of Computer Engineering, Nanyang Technological University, Singapore
E-mail:
[email protected]
We report an autonomous agent for retrieving competitors’ product prices over
the World Wide Web, for the purpose of price comparison at an e-commerce retail
shop. This price watcher model is different from the conventional price comparison
services currently available on the Internet in a way that it collects competitors’
price information without the competitors’ participation and attention. It scans
the price information over the Internet on a regular basis, builds up a knowledge
base at the user’s site and provides a price comparison facility for shoppers to use.
It is an information retrieval utility that could be used as a part of the business
intelligence infrastructure. This paper summaries the application background as
well as the technical details in the design of the prototype.
1
Introduction
The Watcher Agent proposed in this paper is an autonomous software program
that “spies” on the competitors’ prices over the web. The prices collected
from the competitors are stored in a local database. They can be used for
price comparison at the front-end of an e-commerce online shop as well as
for market research at the back-end. This technology will offer itself as a
useful new feature for online shops and help increase consumers’ confidence
in buying the products by showing them the competitors’ prices, and hence
helps improve sales. The agent can be configured such that only the prices
higher than (or equal to) ours are displayed. A snapshot of a shopping site
with price watcher is shown in Figure 1.
One of the barriers for e-commerce retailers to overcome is that most
consumers are not convinced that the price of a product offered at their sites
is the best; and it is always easy for them to surf away to other shopping
sites looking for a better offer 1 . How to encourage the consumer to commit
a purchase on the spot at the current site is thus an issue to be addressed.
PriceWatcher: submitted to World Scientific on June 21, 2001
1
Figure 1. Snapshot of the application of Price Watcher
There are several price comparison services available on the web2,3,4 . The
differences between our price watcher agent and most of the web-based price
comparison software and portals are follows:
1. Designed for usage by individual online shops. Price watcher is
a price-monitoring tool used by individual online shops while the usual
web-based price comparison services are made publicly available for web
surfers to compare prices.
2. Neither broker nor public database is used. For most of the price
comparison services, there exist a mediator which is usually the web
server or service provider, and a centralized database is used to maintain
the price information available to the users. In our watcher agent strategy, a private and confidential database that holds the competitors’ price
information is located at the local site.
3. No participation of retailing shops is required. The way that
some price comparison services work is they let the participating stores
to submit their latest prices to the mediator. Our approach is different
because there is no need to get the competitors involved.
4. Forms part of the Competitor Intelligence strategy. The price
watcher is to be implemented as a part of the competitor intelligence
strategy that includes information retrieval, filtering, analysis, and presentation.
In this paper, Section 2 covers the overall working process of the price
watcher. The product name matching and price extraction algorithms are
PriceWatcher: submitted to World Scientific on June 21, 2001
2
described in detail in Sections 3.1 and 3.2 respectively. The technical limitations about price watcher is given in Section 3.3 and finally we conclude our
work in Section 4.
WWW
Search
Engine
HTTP
requests
query
Web pages
Information Retrieval
Layer
URL Retrieval Engine
search
results
URLs
Market
Explorer
Web pages
Market
Monitor
URLs
Compilation Layer
Price
Watcher
Market Watcher
Storage Layer
DB_MS
marketing source
DB_MI
DB_MP
prices
marketing news
Presentation Layer
Marketing Information System
Price
Comparer
Figure 2. The architecture of Watcher Agent
2
Price Watcher Working Process
The price watcher working process consists of five steps:
1. The set of competitors’ URLs, configuration parameters(e.g. retrieval
scheduling) and product names are obtained from database.
2. The HTML pages are downloaded using the web retrieval engine.
3. A dollar sign detector is used as a filter. Only pages containing dollar
signs like $ and S$ are to be processed further.
4. The product names are searched within each page. The price for any
possible matches is extracted and stored in the local database.
5. The competitors’ price (and our own price) are then queried and shown
in a tabular form.
PriceWatcher: submitted to World Scientific on June 21, 2001
3
3
Technical Details
To monitor a web site, the contents of the web site should be downloaded based
on some schedule setting5 . In the price watcher, only the HTML texts are
to be downloaded. Finding the level of similarity between our product names
and the names provided on the web, as well as extracting the corresponding
prices are the two main challenges facing us. The architecture of the Watcher
Agent is shown in Figure 2. The agent is composed of two major parts. One
part is the price watcher and the other part is the market watcher. The market
watcher helps the administrator of the online shop get the latest information
about his competitors’ web sites. The market watcher part is not covered in
this paper.
3.1
Product Name Matching
We know that one product name can usually be divided into three parts:
brand, model number and description. For example, brand: Canon, model
number: BJC-4200SP and description: Color Bubble Jet Printer. The model
number is believed to be unique for a specific product. The brand part may
appear to be slightly different on different Web sites. For example, Hewlett
Packard and HP (for short). This problem can be solved by inputting more
than one brand equivalents from users. The description part may be quite
different from each Web site. However, this part is not so critical for product
name matching although it is useful in determining where model number or
brand can be found. What we do in product name matching is to allow
users allocate weight for each part. For example, 50%, 30% and 20% for
model number, brand and description respectively. Model number and brand
require exact matching regardless of the character case. Exact matching will
give a similarity level of 1, otherwise the similarity level is 0. Approximate
word matching algorithm6 is applied for similarity level computation of the
description part. The final similarity of each part is given by the product of its
similarity level and weight. The overall similarity level for the whole product
name matching is obtained by summing the final similarity levels of these
three parts. This final value is subsequently compared with the threshold
value to decide if a match has actually been detected.
3.2
Price Extraction
The main operation of the price watcher is to extract the prices from HTML
documents. HTML documents are semi-structured in nature7 , hence extracting information from HTML documents is significantly different from extract-
PriceWatcher: submitted to World Scientific on June 21, 2001
4
ing information from tables in a database. The price extraction algorithm is
developed based on the KPS Mining Algorithm 8 . Once a product name is
matched and located in a HTML document, the following rules are applied to
extract the price.
• For a product name appearing in a title (i.e. <title>, <h1> - <h6>),
the price of the product is most likely to be located in the string after
the product name.
• For a product name appearing in an item list, the price is most likely to
be located in the same item, or the next one until the end of the list.
• For a product name appearing in a cell of a table, the price is most likely
to be located in the same cell, or the same row in the column-wise table,
or the same column in the row-wise table.
• For a product name appearing in a textual line, the price is most likely
to be located in the same paragraph, or the next paragraph, until the
end of the page.
• The price is assumed to be the first one appearing after the product name
if more than one price are found.
For each HTML page retrieved by the system, a Semi-Structured Data Tree7
will be constructed. If a model number can be located in the tree, the brand
and the description are searched within the data node. If none of them can be
located in the current data node, a super data string will be formed from all
the data nodes which are children of the parent of the current data node. The
similarity level of the obtained product name and the defined product name
will then be computed. The price of this product will firstly be searched with
the current data node, and up to three levels if no price information can be
found.
3.3
Price Watcher Limitations
One technical limitation is that the price watcher cannot distinguish Singapore dollar and American dollar. The reason is that the “S$” and “$” are
always used interchangeably in Singapore. In the current prototype implementation, price watcher can only deal with textual data. Another problem
is that the detected product name may not be the one to be monitored although a high similarity level is calculated. For example, “Cartridge for HP
DeskJet 840C Printer” will be easily detected as “HP DeskJet 840C Printer”.
A more sophisticated algorithm is needed to resolve this problem.
PriceWatcher: submitted to World Scientific on June 21, 2001
5
4
Conclusion and Future Work
In this paper, we have reported an autonomous software program called price
watcher that collects competitors’ product prices on the web. The collected
price information will contribute to managers’ business decision making, and
it can be used to enhance shoppers’ confidence via price comparison. The
application of price watcher technology is believed to be relatively new and
would create an impact on the way that retail shops market their goods online.
The first online shop that applies this technology would benefit most, because
it helps to place their business in a market position one step ahead of their
competitors. It is envisaged that the system can be expanded to include
scanning and analysis of competitors’ other information, such as news, new
products, promotions, etc. Work can also be extended to study how this agent
can be integrated into the full infrastructure of business intelligence5 .
References
1. L. Gerald and L. Spiller, Electronic shopping:The effect of customer interfaces on traffic and sales. Communications of the ACM, 41(7), pages
81-87, 1998.
2. B. Krulwich, The BargainFinder agent: Comparison price shopping on
the Internet. In Agents, Bots, and other Internet Beasties, SAMS.NET
publishing, pages 257-263, 1996.
3. R. B. Doorenbos, O. Etzioni and D. S. Weld, A Scalable ComparisonShopping Agent for the World-Wide Web. In Proceedings of the First
International Conference on Autonomous Agents, pages 39–48, 1997.
4. Pricewatch for Computer Products, http://www.pricewatch.com.
5. Q. Chen, P. Chundi, U. Bayal and M. Hsu, Dynamic Software Agents for
Business Intelligence Applications. ACM Autonomous Agents’98, pages
453-455, 1998.
6. J. C. French, A. L. Powell and E. Schulman, Applications of Approximate
Word Matching in Information Retrieval. In Proceedings of the Sixth
International Conference on Knowledge and Information Management,
pages 9–15, 1997.
7. S. J. Lim and Y. K. Ng, An automated approach for retrieving hierarchical data from HTML tables. In Proceedings of the Eighth International
Conference on Information and Knowledge Management, pages 466-474,
1999.
8. T. Guan and K. F. Wong, KPS: a Web Information Mining Algorithm.
Computer Networks 31(11-16): 1495-1507, 1999.
PriceWatcher: submitted to World Scientific on June 21, 2001
6