Health Website Quality: Towards Automated Analysis
Thomas Nind
Vicki Hanson
Stephen McKenna
University of Dundee
[email protected]
University of Dundee
University of Dundee
Ian Ricketts
Falko Sniehotta
Zhiwei Guan
University of Dundee
University of Newcastle
Google, Inc
Jeremy Wyatt
Lorna Gibson
Wendy Moncur
University of Warwick
University of Dundee
University of Dundee
ABSTRACT
The innovative use of information and computing technology to
deliver improved healthcare is a priority research area in the
Digital Economy programme. We have developed and tested
software that extracts features from health websites that may be
associated with quality. This is the first stage in a larger on-going
project to create a machine learning algorithm that can predict the
quality of health websites. This feature extractor achieved a mean
accuracy of 77% identifying features in 36 manually rated
websites. A preliminary analysis of features extracted from 8,227
websites from the Health on the Net (HON) and from Google
search engines showed significant associations between the
extracted features and a website’s presence in the HON database
of certified sites (a proxy measure for health information quality).
The following features were detected:
1. INTRODUCTION
Evaluating the quality of health related websites can be difficult
for people. Existing evaluation tools can be difficult to use or
unreliable [1]. One way people can find high quality websites is
to use a dedicated health search engine, such as that provided via
the Health on the Net search portal (HON) [4]. The HON search
engine serves web pages from a pool of accredited health web
pages based on the user’s search terms. Alternatively people find
high quality health information by focusing solely on National
Health Service (NHS) sites or those recommended by their doctor.
In reality, however, many people use a general purpose search
engine and rely on intuition and heuristics to evaluate information
quality (IQ) and predominantly choose from the first few search
results [2]. One route to help these people to reliably find higher
quality websites would be either to annotate or re-order search
results based on a consistent quality assessment algorithm.
Previous work into search annotation has demonstrated that
providing information such as popularity and presence of third
party certifications can improve users’ ability to judge website
credibility [6]. This study differs from previous work in its focus
on website surface features and in its approach to determining
which features are indicative of quality.
2. FEATURE EXTRACTOR
2.1 Features
In order to build a quality assessment algorithm, we must first
have a set of website features that can be reliably detected and that
are associated with the presence of high quality health
information. The approach used in this study was to detect as
many potential features as possible, based on the existing
credibility and information quality literature [5], and then discard
those features that were not reliably detected or that were not
found to be associated with quality.
The presence of advertising was detected by searching web
page HTML for entries in the ‘EasyList’, a publicly available
list of advertising domain names and page content patterns
(e.g. -adserver/). Overly general patterns were manually
removed giving 12,195 suitable patterns.
The accessibility of web pages was evaluated via 3 features.
The presence of a ‘skip link’ for screen readers, the
proportion of content images containing an alt text
description and the proportion of decorative images (less
than 5 pixels in dimensions) containing alt=””.
Most health related websites provide the means to contact the
site’s authors. This is often done through a contact page.
Detecting the presence of such a page was important as a
feature itself but also in identifying a physical address,
telephone contact details and website feedback forms. Once
found, contact pages were downloaded and searched for
contact details (postcode/telephone/feedback form) using
broad regular expressions.
An important element of credibility is referencing to external
sources of information. A count was made of all external
hyperlinks (those going to different sites) and a count of all
internal hyperlinks (those going internally). Additional note
was taken if a reference list was included on the page.
Calculating the readability of a website can be hard due to
the difficulty of distinguish programmatically between
content and navigation items. In this case the Flesch-Kincaid
readability test was applied to the longest paragraph on the
page containing at least 70% English words.
HON certification was confirmed by searching the page for
the HON stamp. High accuracy is important since ‘presence
in the HON search engine’ and ‘bearing the HON stamp’
were used as a proxy for information quality in the
subsequent association analysis (see below).
Top level domain (.gov / .co.uk / .com etc) is important for
determining the source of a site (e.g. governmental) and was
extracted from the URL of each page analysed. The top level
domain also indicates the country of origin in many cases
which may be useful for determining relevance to reader.
The presence of a donation button is considered as
degrading credibility. This feature was detected by searching
for hyperlinks containing “donate” or “donation”. Since
donation buttons may be present as images rather than text,
the ‘src’ element of images was also searched.
Most health websites contain a privacy policy or disclaimer
intended to limit liability in the event that readers, acting on
the advice, suffer harm as a result of false or misleading
information presented via the site. The presence of such a
page was detected using regular expressions.
The presence of a discussion forum, commenting, or a wiki
was detected. User generated content can be unreliable and
may dilute the quality of information presented on a website.
Social rating systems may be useful predictors of website
quality. For this reason, the number of Facebook Likes was
extracted and recorded.
Blogs are generally considered to be less credible sources of
information than medical experts or journal articles. Their
presence on a website may be associated with lower quality
health information.
It is important to distinguish between websites offering
medical information and those selling a product. The
presence of an online shop, ‘cart’ or ‘basket’ was detected.
2.2 Feature Extractor Accuracy
A selection of test sites was required to assess the accuracy of the
feature extractor at detecting each feature. The test sites were
identified using the most popular UK health search terms of 2012
via Google Insights for Search. The 3 top search terms were
selected from the categories: “Health Conditions”, “Ageing and
Geriatrics” and “Alternative” (see Table 1).
4 web pages were downloaded for each search term. In each case
the first result with a unique domain name was selected. This
provided 36 web pages reflecting a range of common searches.
These pages were manually assessed for the presence of each
feature. The results from this manual analysis were compared to
those of the results of the automated feature extractor, to give a
comparative measure of performance for the extractor algorithms
to detect the targeted features.
Accuracy ratings are presented below (see Table 2). Mathews
correlation coefficient was used to determine statistical
significance. It provides a measure of the predictive quality of the
feature detection algorithm, taking into account the ratio of true
positive, true negative predictions to false positive and false
negative predictions. All non-significant features were discarded.
Accuracy calculations were not performed for programmatic
features such as Facebook Likes and readability as these cannot be
manually checked but are likely to be accurate.
The least accurately detected features were telephone number and
post code. These features rely on the successful detection of a
contact page followed by a variety of country specific regular
expressions. The presence of a disclaimer was also very difficult
to detect because it was often buried several links into a website.
A more rigid definition of what constitutes a disclaimer would be
useful. A balance must be struck between trying to improve
current accuracy and identifying alternative features given the
purpose of the extractor is to power a prediction algorithm.
3. FEATURE ASSOCIATION WITH
HEALTH WEBSITE QUALITY
3.1 Methods
Previous researchers investigating automated quality assessment
have often used ‘expert ratings’ as ground-truth against which to
test their algorithm. This often relies on selecting a narrow topic
area where there are well established guidelines e.g. depression
[3]. Since machine learning requires large datasets, such manual
rating is not feasible. Instead, website quality ground-truth was
defined as presence in the HON search portal.
The Google Insights for Search tool was used to gather 581
popular health search terms in the same manner as described
above (See 2.1 Feature Extractor Accuracy). The Google search
API was used to perform searches using the terms with both the
HON and Google search engines. These searches resulted in
4,601 unique URLs from Google (Regular quality) and 4,200
unique URLs from HON (High quality). 574 URLs in the Google
set were also present in the HON set and so were discarded. This
resulted in 8,227 unique URLs for processing by the Feature
Extractor. Although there were 8,227 unique pages retrieved,
these came from only 2,076 separate domains i.e. many were
pages on the same site e.g. Wikipedia. Where multiple pages
were available for a domain, the results of the Feature Extractor
were averaged to give a single result per domain.
3.2 Results
All categorical features (present/not present) which achieved
significant detection accuracy were entered into a chi-squared test
for independence. For each feature, a 2x2 contingency table was
created and a chi-squared probability calculated. The results of
this analysis are presented in Table 3 (overleaf). Continuous
features were analysed using a Mann-Whitney U test (see Table 4
overleaf).
4. DISCUSSION AND FUTURE WORK
The association analysis demonstrates that when comparing web
pages retrieved through the Google search engine with those
retrieved through the HON search engine: HON sites are more
likely to have an accessibility skip link (for screen readers), alt
text for content images, references and to contain a privacy policy.
HON sites are less likely to have user generated content (e.g.
comments), be directly selling a product, use alt=”” for decorative
images and have fewer Facebook Likes.
It is surprising that HON sites are less likely to follow the
accessibility guideline of using alt=”” in decorative images (1-5
pixel diameters) while they are better at providing alt text for
content images and accessibility skip links. This may be the result
of development software or the fact that it is a less well known
accessibility guideline. The feature extractor could be expanded
to look for other accessibility features such as use of longdesc, use
of frames, noframes support, tab indexes etc.
Researchers have long associated both donation links and
advertising with low quality. This study demonstrates that high
quality (HON certified) websites are no more likely to contain
either feature than regular websites.
Work is ongoing to identify a source of low quality health
websites to use as a third comparison group. Possible sources
include the Advertising Standards Agency, the Trading Standards
Office and phishing/malware blacklists.
The current feature extractor is the first step in being able to
present to individual searchers of health information an estimate
of the quality of website information. The next step is to
implement a machine learning algorithm that can make quality
predictions based on the training dataset described above.
5. ACKNOWLEDGEMENTS
This work is supported by a Google Research Award, RCUK
project EP/G066019/1 “SIDE: Inclusion through the Digital
Economy” and by a Wolfson Merit Research Award WM080040.
REFERENCES
1.
2.
Bernstam, E.V., Shelton, D.M., Walji, M., and Meric-Bernstam,
F. Instruments to assess the quality of health information on the
World Wide Web: what can our patients actually use?
International journal of medical informatics 74, 1 (2005), 13-9.
Eysenbach, G. and Kohler, C. How do consumers search for and
appraise health information on the world wide web? Qualitative
study using focus groups, usability tests, and in-depth interviews.
BMJ. 2002;324:573-577.
3.
Griffiths, K.M., Tang, T, T., Hawking, D., and Christensen, H.
Automated Assessment of the Quality of Depression Websites.
Journal of Medical Internet Research 7, 5 (2005).
4.
Health On the Net. HONcode: Guidelines - Operational definition
of the HONcode principles. 2011.
http://www.hon.ch/HONcode/Webmasters/Guidelines/guidelines.
html.
5.
Pornpitakp, C. The Persuasiveness of Source Credibility: A
Critical Review of Five Decades’ Evidence. Journal of Applied
Social Psychology, 2 (2004), 243-281.
Reference List
Qα=82%
Ø=0.63
X2=14.57*
HON certification
Qα=100%
Ø=1
X2=36*
Donation Button
Qα=92%
Ø=0.82
X2=24.08*
Privacy Policy
Qα=92%
Ø=0.68
X2=16.75*
Disclaimer
Qα=63%
Ø=0.30
X2=3.25
User generated
content
Qα=60%
Ø=0.39
X2=5.51*
Blog
Qα=62%
Ø=0.48
X2=8.23*
Selling a product
Qα=89%
Ø=0.78
X2=22.05*
Table 3. Categorical features associated with presence in the
HON search portal. * indicates statistical significance, P<0.05
Google
Feature
Schwarz, J. and Morris, M.R. Augmenting Web Pages and Search
Results to Support Credibility Assessment. CHI
2011:Session:Search & Information Seeking, (2011), 1245–1254.
Table 1. Search terms used to obtain test sites
Alternative and
Search
Health
Ageing and
Natural
Conditions
Geriatrics
Category
Medicine
Search
Terms
Used
Cancer
Dementia
Acupuncture
Diabetes
Osteoporosis
Detox
Back Pain
Alzheimer
Aloe Vera
Number (%)
635 (34)
116 (40)
0.167
Accessibility Link
391 (22)
98 (34)
0.000*
1123 (68)
215 (73)
0.075
Feedback Form
205 (11)
36 (12)
0.639
Reference List
55 (3)
46 (16)
0.000*
Donation Button
232 (13)
41 (14)
0.640
Privacy Policy
921 (52)
200 (69)
0.000*
User generated
content
353 (20)
34 (12)
0.001*
Blog
137 (8)
23 (8)
0.906
Selling a product
353 (20)
34 (12)
0.001*
Table 4. Continuous features associated with presence in the
HON search portal. * indicates statistical significance, P<0.05
Google
Table 2. Feature detection accuracy. * indicates statistical
significance, P<0.05
Mathews correlation
Significance
Feature
Accuracy
Ø=0 (no correlation)
Ø>0 (positive correlation)
Ø<0 (negative correlation)
2
Advertising
Qα=85%
Ø=0.68
X =16.76*
Accessibility Link
Qα=86%
Ø=0.71
X2=18.28*
Qα=98%
Ø=0.80
Contact Page
X2=23.29*
Postcode
Qα=53%
Ø=0.10
X2=0.38
Telephone
Qα=50%
Ø=0
X2=0
Feedback Form
Qα=66%
Ø=0.35
X2=4.57*
P Value
Advertising
Contact Page
6.
HON
Feature
HON
Median (Mean)
P Value
Readability
35.55 (33)
34.54 (33)
0.598
Proportion of
decorative images
with alt=””
0.5 ( 0.39)
0.35 (0.33)
0.001*
Proportion of
content images
with alt text
0.5 (0.51)
0.62 (0.59)
0.012*
Facebook Likes
9 (12348 α)
0 (39)
0.000*
Proportion of site
links external
0.14 (0.22)
0.18 (0.26)
0.779
α
The mean for this variable is very high due to extreme outliers such as
youtube.com and facebook.com which have over 200,000 Likes each