Academia.eduAcademia.edu

What do users really care about?

2012, Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12

Abstract Expert evaluation methods, such as heuristic evaluation, are still popular in spite of numerous criticisms of their effectiveness. This paper investigates the usability problems found in the evaluation of six highly interactive websites by 30 users in a task-based evaluation and 14 experts using three different expert evaluation methods. A grounded theory approach was taken to categorize 935 usability problems from the evaluation. Four major categories emerged: Physical presentation, Content, Information Architecture and ...

Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA What Do Users Really Care About? A Comparison of Usability Problems Found by Users and Experts on Highly Interactive Websites Helen Petrie Human Computer Interaction Research Group Department of Computer Science University of York, York UK YO10 3WF [email protected] Christopher Power Human Computer Interaction Research Group Department of Computer Science University of York, York UK YO10 3WF [email protected] ABSTRACT 16], Guidelines Review [6] and Consistency Inspection [14], but the best known is Heuristic Evaluation (HE), developed by Molich and Nielsen [17, 18, 20]. HE involves asking 3 – 5 usability experts to work through an interactive system, seeing whether any of a set of heuristics is violated, thus creating usability problems. The experts then come together and produce a consolidated set of usability problems and rate them on a 4 level severity scale from “catastrophic” to “cosmetic only”. HE is described in numerous HCI textbooks [6, 14, 22] and on authoritative websites such as usability.gov and UsabilityNet. Many of these sources quote the original Molich and Nielsen heuristics, as well as Shneiderman’s 8 golden rules of interface design [23] and Tognazzini’s basic principles for interface design [25] which can also guide an HE. Expert evaluation methods, such as heuristic evaluation, are still popular in spite of numerous criticisms of their effectiveness. This paper investigates the usability problems found in the evaluation of six highly interactive websites by 30 users in a task-based evaluation and 14 experts using three different expert evaluation methods. A grounded theory approach was taken to categorize 935 usability problems from the evaluation. Four major categories emerged: Physical presentation, Content, Information Architecture and Interactivity. Each major category had between 5 and 16 sub-categories. The categories and sub-categories were then analysed for whether they were found by users only, experts only or both users and experts. This allowed us to develop an evidencebased set of 21 heuristics to assist in the development and evaluation of interactive websites. HE and other forms of expert evaluation have come in for a range of criticisms, including: Author Keywords Expert evaluation; heuristic evaluation; user evaluation; usability problems; heuristics. • low overlap between usability problems proposed by expert evaluations and user evaluation, as low as 10% [1, 9, 12] • different experts or groups of experts produce different problem sets [8, 9, 12, 13] • expert evaluations over-emphasize low severity problems at the expense of high severity problems [12] ACM Classification Keywords H.5.2. [User Interfaces]: Evaluation/methodology General Terms Experimentation, Human Factors. INTRODUCTION Expert evaluation is a logical and important component in the development of interactive systems. It makes sense to have experts identify problems with a system before exposing users to it, even if those users are only conducting an evaluation themselves. In the case of rapidly developed and deployed systems, such as many websites, expert evaluation may be the only evaluation that is undertaken before a system goes live. There are numerous forms of expert evaluation, including Cognitive Walkthrough [15, However, the studies that have compared different evaluation methods have also come in for considerable criticism, particularly by Gray and Salzman [7] and particular aspects of the comparisons, such as matching problems between methods have also been criticized [5, 10]. In this paper, we take a different approach to the issues concerning expert evaluation. The criticisms of expert evaluation methods have not deterred researchers and practitioners from using these methods. But if the overlap between problems found by experts and those reported by users is only of the order of 10% and these methods may not identify the problems that users find most severe, then we must ask if expert evaluations are a good use of the time and effort of the experts and the development teams? If we Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI’12, May 5–10, 2012, Austin, Texas, USA. Copyright 2012 ACM 978-1-4503-1015-4/12/05...$10.00. 2107 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA want to keep using these methods, what can be done to improve the effectiveness of these methods? In this paper we will concentrate on this question, and examine sets of heuristics used by experts in evaluations. compare between the problems found between the three expert methods, given the problems of comparing evaluation methods [7]. Instead, in this paper we are investigating what types of usability problems users encounter that are missed by experts and vice versa. From this analysis, we will be able to propose a current, evidencebased set of heuristics to guide developers and expert evaluators of highly interactive websites. The first concern is with the continued use of the original Molich and Nielsen [17, 18, 20] heuristics to guide evaluation and their unvalidated adaptation for specific areas of modern interactive systems, particularly the web. The Molich and Nielsen heuristics were based on sound evidence and were a sensible attempt to cut through the complexity of interface guidelines available at the time of their writing. Molich and Nielsen [17] ran a competition on the evaluation of an interactive system and analysed 77 entries for the types of usability problems produced in them. Nielsen [19] analysed 249 usability problems from 11 different projects to validate the heuristics. Therefore it is clear that Molich and Nielsen were careful in developing the heuristics from a sound evidence base. METHOD Design Six complex, highly interactive websites were each evaluated by 15 potential users using a think-aloud protocol and three different expert evaluation methods using teams of 3 experts. The three expert evaluation methods used were Collaborative Heuristic Evaluation (CHE) [21], Group Expert Walkthrough (GEW) and Group Domain Expert Walkthrough (DEW) [14]. Both potential users and experts were asked to identify usability problems and rate them on a four-point scale from “catastrophic” to “cosmetic” [18]. A grounded theory approach [2, 24] using open coding was used to categorize the usability problems found in the different evaluations, to allow natural groupings of problems to emerge. Problems found by experts only, users only and both users and experts were then analyzed. However, since then, interactive systems have become much more complex and diverse. In addition, the web has a particular set of conventions and methods of interaction to which users have become accustomed during regular use. These conventions and methods have become ingrained in users’ mental models about how the web works, and thus will influence what they perceive as a usability problem. The web also has content and information architecture that may result in usability problems that were not typical of 1980s interfaces. Thus, the Molich and Nielsen heuristics may no longer capture the main usability problems that user have. Websites The websites used were highly interactive and transactional government websites. Several of the websites were ones where users provide information to government agencies, for example to qualify for particular benefits. Several others were informational, but required the user to find information by specifying criteria for searching and for filtering results. A number of the websites were already publicly available at the time of evaluation, while others were fully functioning prototypes, ready to launch. For reasons of confidentiality, the websites will not be named and will be referred to as website A – F. In response to the evolution of interactive systems to webbased systems, a number of authors have adapted the Molich and Nielsen heuristics for the web. Instone [11] produced “site usability heuristics for the web” and more recently Budd [3] produced “heuristics for modern web application development”. Each of these is a light reworking of the Molich and Nielsen heuristics with examples drawn from the web. However, the question remains: are these heuristics really representative of the problems that users have with current interactive systems, in particular with web-based systems? It does not seem appropriate to take the problems that users had with interactive systems in the late 1980s and transfer them to web-based systems in the 2010s. This is particularly problematic when there is no empirical evidence that these problems are actually the ones encountered by users. Is this in fact one of the reasons that expert evaluations are producing so little overlap with user evaluations – that the heuristics used by experts are not fit for purpose? Method for user evaluations Participants 30 participants took part in the study, 13 were women and 17 men. Participant ages ranged from 22 to 61 years, with a mean age of 33.3 years (standard deviation = 10.61). On average participants had used the web for 11 - 15 years and rated their web experience as “High” on a five-point scale. Participants reported average daily web use of between one and five hours per day. None of the participants had previously used any of the websites evaluated in the study. 16 participants were university students and 14 worked in a range of occupations. Three of the websites related to higher education and initial career plans, so these websites were evaluated by the students, as they would be target users for those websites. The other three websites were aimed at the general population, so the other participants evaluated these. The paper will explore this question by comparing the usability problems found in a large-scale evaluation of six complex, highly interactive, government websites. The websites were evaluated both by potential users and by experts using three different expert evaluation methods. However, it is not the intention of this paper to 2108 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA 26 participants evaluated three websites each and four participants evaluated two websites each. Participants were remunerated with Amazon gift vouchers, £5 per website evaluated. (the UK government’s digital service, providing online access to a wide range of government services and information) or organizations that provided usability services to DirectGov. Three of the usability experts worked for the University of York. Equipment Standard personal computers running either Windows or MacOS and a range of web browsers (Internet Explorer, Firefox) were used, according to the individual participant’s preferences. The computer also ran a screen capture program (Morae for the Windows machines, ScreenFlow for the MacOS machines) that recorded the screen and voice of participant and researcher. Six domain experts participated: four were women, two were men. Two were business analysts who occasionally provided help in relation to user issues, others were team leaders or advisors responsible for re-design of digital services and regularly provided information about users to development teams. Half the domain experts had less than one year’s experience in the particular domain, the others had two to four years experience. Only one domain expert had participated in a usability evaluation before this study. Procedure Each session lasted 60 or 90 minutes, depending on the number of websites the participant chose to evaluate. Participants were first briefed about the study and signed an informed consent form. They then completed a brief demographic user questionnaire. For each website evaluation, participants were provided with a persona and a scenario of use and the relevant information needed to complete the scenario. Participants undertook a concurrent think aloud protocol, talking though the usability problems as they were encountered. Participants were gently prompted if they did not keep up the think aloud commentary. Equipment Two computers were used in each expert evaluation session: one accessed the website being evaluated and was under control of the experts; the other was used to record problems raised by the experts. The displays of both computers were projected onto a wall so all the group could see it clearly. Overall Procedure Each evaluation session was led by a facilitator and assisted by a scribe (one of the authors in each case, neither of whom participated in the evaluation). The facilitator introduced the method and briefed experts on the procedures to be followed, including the use ofa persona, a scenario of use and provided copies of the original Molich and Nielsen heuristics and severity rating scale. Once the introduction was complete, the evaluation started and continued for exactly one hour. Each time a problem was encountered i.e. if the participant made some comment that indicated a problem (e.g. “I don’t understand this” “I can’t figure out what to do now”), the researcher asked the participant to pause briefly and rate the problem for its severity on a scale where 1 = cosmetic, 2 = minor, 3 = major, 4 = catastrophic. Procedure for Collaborative Heuristic Evaluation (CHE) Participants were allowed 30 minutes to evaluate each website. This procedure was repeated for two or three websites. Participants then completed a brief post-study questionnaire, were debriefed and signed off the informed consent form. Each group of usability experts evaluated all six websites, with each expert evaluating each website only once. Each group used each method twice and each website was evaluated with each method three times. This design meant that individual differences between experts and between the websites would not have undue effects on the results. Experts worked as a group, with one expert “driving” the website. The collaborative version of heuristic evaluation developed by Petrie and Buykx [21] was used. Any expert could propose a potential usability problem. Experts described each usability problem so the scribe could record them and to create consensus on the description of the problem. Experts then rated its severity privately using the four point rating scale [18]. If an expert did not think the potential problem was a usability problem, they rated it as having a severity of zero. This allowed different experts to provide both their view of whether a potential problem was actually a usability problem and what its severity was. Experts Procedure for Group Usability Expert Walkthrough (GEW) 14 usability experts participated, five were women, nine were men. The majority had higher education qualifications or had taken courses in HCI. The majority had over five years experience in usability and worked as professionals in user experience, interaction or software/product design with usability making up between half and all of their current role. 11 described themselves as “experienced” in usability, the remainder as “junior”. Nearly all had conducted HEs but only three had previously participated in GEWs and DEWs. 11 of the usability experts worked for DirectGov The evaluation period was split into two 30 minute periods. For the first period, one expert took on the role of the user described in the persona and worked through the scenario of use while providing a concurrent verbal protocol. The other experts could ask questions of the “user” and note usability problems. In the second period, experts worked as a group to identify usability problems by reworking the scenario of use. Severity ratings for each usability problem were reached by consensus. Method for expert evaluations 2109 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA Category PHYSICAL PRESENTATION Examples Page does not render properly Poor, inappropriate color contrast Text/ interactive elements not large/clear /distinct enough Page layout unclear/confusing Timing problems Key content/ interactive elements, changes to these not noticed “Look and feel” not consistent Navigation is longer than the footer and overlaps the footer (A046) Text on buttons does not transform gracefully when resized (E106) Dates hard to read - grey on grey background, not very clear format (A151) Interface colors are too dark, too saturated, "glowing" (C23) Radio button area / sensitivity is too small (D17) Date boxes too small (B50) The heavy red bar below number of courses separated the page in two, thought the information below was something irrelevant like a advert (C83) Text very tight up to border, difficult to read, unattractive (E4) Holding error message was too brief to read (A76) Clicking on 'XXX' button took more than one minute to load (D15) Did not notice the second set of input requirements relating to the password (D5) Did not see list expansion (F5) Page looks different - buttons have moved, text is smaller (A282) Table unexpectedly transposed - University was row, now column (F103) CONTENT Too much content Content not clear enough Content not detailed enough Content inappropriate or not relevant Terms not defined Duplicated or contradictory content Overly wordy, but nothing to assist the user (A037) Number of results is too large for user to work with effectively ("Off putting", "Overwhelming") (F64) More plain English needed on Welcome Page (E65) Disclaimer not clear - which relevant organizations? (B48) The information provided was very sparse (D2) Lack of information about syllabus/ curriculum (F32) Insensitive explanation of terminal illness (A144) Information on overview page does not seem relevant to task (F50) Acronyms f/t and f/d not clear, leaves user to determine meaning (F48) “Includes flexible start dates” - what does this mean? (C56) Inconsistent information about sending personal items (A262) Agreement statistics and number of respondents columns don't agree (F104) INFORMATION ARCHITECTURE Content not in appropriate order Why is all this information presented before the registration? (E76) Check list at wrong end of the process (B116) A lot of information - expecting step by step process (A78) Not enough structure to the content The results get lost in all the other text (D70) Table in pseudo-alphabetic order, unclear (F111) Structure not clear enough After completing the questionnaire, expected to go straight to the reports, not back to the beginning (D86) Page is not actually university details – misleading title (F95) Headings/titles unclear/confusing Help – but this is not help, this is further information (B161) Purpose of the structures not clear What are these boxes on the side for? (A64) Are the colors significant (block colors behind the groups of services) (A255) INTERACTIVITY Lack of information on how to proceed and why things are happening Labels/instructions/icons on interactive elements not clear No guidance on how to use the university search (F121) Confusing that it is recovering the death certificate rather than me entering information from the death certificate (B27) Unclear what "special notes" checkbox will do (A187) Asterisks here appear to mean incomplete, not the usual mandatory (B42) 2110 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA Have to provide details again, even though already provided them (A190) "Please access urls shown before submitting info" - expected to do quite a lot of work (copy, paste, look up info) (B149) Input and input formats unclear Postcode input not at all obvious (D55) I would prefer to enter months via names than numbers (B10) Not clear whether the system has saved results (D67) Lack of feedback on user actions and system progress The search returned no results, provided no guidance as to why this might be (C3) Why is the exit button at top? (B25) Sequence of interaction illogical Activation code retrieval out of sequence (E82) Options not logical/complete What if I do not have UK qualifications – no options (D117) Why isn’t niece/nephew in the list of relatives? (B41) Far too many options, when the main goal is to view a report (D83) Too many options Bank account options - too many options (A165) Breadcrumb trail literal, not the structure of the site (F108) Interaction not as expected Tabbing is illogical (skips) (B3) No way to sort short list (F65) Interactive functionality expected Surprised that there was not a de-select option, considering the amount of is missing checkboxes that were pre-selected (C2) Unclear if links on this page will lead to external sites (F12) Links lead to external sites/are PDFs without warning Was given a PDF doc, did not indicate this and gave no other options (B123) Why isn't Contact Us a link in the text? (E6) Interactive and non-interactive elements not clearly identified Arrow in table header not clearly indicated as selectable for sort functionality (F55) Next button is a long way away from the text I am to read (A74) Interactive elements not grouped clearly/logically Radio button for correction of error way at bottom (B98) No information about how personal data is treated (A263) Security issues not highlighted User unclear who will get and use the information (B36) Why is password choice so restrictive? (A122) Problems with choosing and validating passwords Password case-sensitive with no indication this is the case (F72) Error message does not indicate what bit of information is wrong (B19) Error messages unhelpful Unhelpful error message "Form percentage must be equal to "100" (E34) Table 1. Categorization of usability problems (Axx – Fxx refer to the six websites and problems codes). Duplication/excessive effort required by user Procedure for Group Domain Expert Walkthrough (DEW) an open coding technique, repeatedly summarizing and grouping the problems, until natural and appropriate categories emerged. The grounded theory also resulted in grouping the initial set of categories into more abstract categories (henceforth referred to as “major categories”) such as “Physical presentation” and “Content” (see Table 1). Inter-coder reliability was then established by having a third researcher categorize a sample of 50 problems. Cohen’s Kappa (K) [4] was calculated on agreement between one of the original coders and the new coder for both major category and sub-category. Both calculations showed satisfactory levels of agreement (for major categories: K = 0.93; for sub-categories: K = 0.89). Initially, domain experts were given a 30 - 45 minute introduction to the principles of usability evaluation. The method was then the same as that described for GEW. The two groups of domain experts each evaluated one website in the domain of their expertise using the DEW method. Data analysis For each website, a unified list of usability problems from all the methods was created. A strict procedure was followed for matching problems from different methods. The problem needed to be about the same interactive element/unit of content and describe the same type of problem for the user. RESULTS A grounded theory approach was then taken to categorizing the usability problems. This was done blind to which method had produced the problems. Two researchers used A total of 947 distinct problems were identified. 12 were discarded as not being usability problems (e.g. user forgot their postcode) or being too vague for categorization. This 2111 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA left a pool of 935 usability problems, an average of 155.8 problems per website (standard deviation = 66.1, range 81 to 271). = 4, n.s.), Interactivity (chi-square = 41.50, df = 16, p < 0.001). Table 1 shows the emergent categorization of usability problems, for categories with five occurrences or more. This accounted for 907 problems. The four major categories that emerged from the coding were: Physical Presentation, Content, Information Architecture and Interactivity. We would have preferred to use the term “Content Architecture” to be consistent with our use of the term “Content”, but as Information Architecture is a known term in HCI and beyond, that was used. Interactivity was the largest category, with 16 sub-categories; the other categories have a more even breakdown, with Physical Presentation having seven sub-categories, Content having six sub-categories and Information Architecture having five sub-categories. Sub-Category Expected Ratio Sub-Category Expected Ratio Users only Experts only 11.2% (31) Both users and experts 8.5% (11) Physical Presentation 13.4% (67) 21.0% (109) Content 17.0% (85) 22.7% (63) 21.7% (28) 19.4% (176) Information Architecture 8.6% (43) 10.5% (29) 8.5% (11) 9.2% (83) Interactivity 61.1% (306) 55.6% (154) 61.2% (79) 59.4% (539) 1.81 : 1 Timing problems 9:1 Security issues not highlighted 8:1 Page layout unclear/confusing 5:1 Interactive functionality expected is 4.8 : 1 missing Input and input formats unclear 4:1 Links lead to external sites/are PDFs 4:1 without warning Poor color contrast 4:1 Interaction not as expected 3.7 : 1 Table 3. Usability problems that were reported more frequently by users only than experts only. Table 2 shows the distribution of problems into the major categories, for problems found by users only, experts only and both users and experts. The distribution of problems found only by users and only by experts was very similar, with no significant difference between them (chi-square = 7.45, df = 6, n.s.). Nor was there a significant difference between these distributions and the distribution of problems found by both users and experts (chi-square = 5.439, df = 3, n.s.). Category Ratio users : experts Total Ratio experts : users 0.55 : 1 “Look and feel” not consistent 2:1 Content not clear enough 1.4 : 1 Key content/interactive element, 1.2 : 1 changes to these not noticed Headings/titles unclear/confusing 1:1 Purpose of the structures not clear 1:1 Terms not defined 1:1 Table 4. Usability problems that were reported more frequently by experts only than users only. To explore where the differences in the distribution of problems in the sub-categories lay, the sub-categories for which users only found problems and those for which experts only found problems were analysed. As users found nearly twice as many problems as experts (501 vs 277, a ratio of 1.81 : 1), the ratio of user only problems to expert only problems was calculated for each sub-category. Table 3 shows those sub-categories with the most extreme ratios in favor of users only. Thus users are far more concerned with timing problem, security issues and confusing page layout. Table 4 shows those sub-categories with the most extreme ratios in favor of experts only (to make this easier to understand, the ratio of experts only : users only problems was used in this table, in comparison to an overall ration of 0.55 : 1). Thus experts are more concerned with Total 501 277 129 907 Table 2. Usability problems identified by users only, experts only and both users and experts (% and number). However, the distribution of problems into the subcategories was significantly different for problems found by users only and problems found by experts only for three of the four major categories: Physical Presentation (chi-square = 14.18, df = 6, p < 0.05), Content (chi-square = 11.78, df = 5, p < 0.05), Information Architecture (chi-square = 1.38, df 2112 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA DISCUSSION AND CONCLUSIONS consistency of the “look and feel” of a website, with content not being clear enough for users to understand, and the saliency of key content and interactive elements on the page and changes to these content and interactive elements. Table 5 shows the sub-categories with the highest proportion of problems found by both users and experts. Thus both users and experts are concerned about having too many options in interaction, problems with choosing and validating passwords and interactive elements and their associated labels and text not being grouped together clearly and logically. Sub-Category This paper has presented an analysis of 935 usability problems found in the evaluation of six complex, highly interactive websites, using both user evaluation and three different expert evaluation methods. As has been found previously [1, 9, 12], the overlap between problems found by users and by experts was relatively low, in this case only 14.2%. However, in this paper the aim was not to compare usability evaluation methods, but to look at the types of problems encountered by users but missed by experts and vice versa, in order to propose a new evidence-based set of heuristics to guide both developers and expert evaluators of highly interactive websites. % of problems in sub-category The first step was to categorize all the usability problems using a grounded theory approach to allow the categories to emerge themselves. This resulted in four major categories of Physical Presentation, Content, Information Architecture and Interactivity. These, of course, are four major themes of discussion about the web and interactive systems, so these categories are not surprising in themselves, but it is interesting that these major categories emerged. Within each major category, a number of sub-categories emerged, with Interactivity having the largest number of subcategories. Again, this is perhaps not surprising, as interactivity is the newest aspect of the design of websites, as websites move towards Web2.0 and become more interactive than the informational websites typical of the 1990s and early 2000s. Thus web developers may be less familiar and confident about how to produce these aspects of websites. Too many options 40.0% (2/5) Problems with choosing and validating 30.0 (6/20) passwords Interactive elements not grouped 27.3 (6/22) clearly/logically Interactive and non-interactive element 26.3 (5/19) not clearly identified Labels/instructions/icons on interactive 24.0 (18/75) elements not clear Lack of feedback on user actions and 24.0 (3/14) system progress Content not detailed enough 22.8 (13/57) Content inappropriate or not relevant 21.2 (7/33) Table 5. Usability problems that were reported by both experts and users. The next step was to analyze those problems that users were likely to encounter and experts were likely to miss and vice versa. This revealed some unexpected results. Users were much more likely than experts to encounter security issues, input format problems and poor color contrast, all problem areas that we expected experts to be monitoring for carefully. However, it was less surprising that experts were more likely than users to find problems with consistency of the “look and feel” of the website, unexplained terminology and saliency of key content and elements, as these are areas that experts ought to be monitoring for carefully. Finally on the basis of the analysis of usability problems, we can propose a new set of heuristics for developing and evaluating current highly interactive websites. For these heuristics we looked both at the severity of problems for users and their frequency. Severe problems clearly need to be identified early if possible; but frequent problems, even if not severe, should be addressed, as the cumulative effect of many problems may also be highly detrimental to users. Therefore, usability problem sub-categories with median severity ratings from users of 2.0 or higher were identified as well as sub-categories with a problem frequency of 10 or more instances from users or both users and experts. These sub-categories are shown in Table 6, now turned into positive heuristics for developers. 18 sub-categories were identified by each method, but only 12 sub-categories were identified by both methods. Five sub-categories were identified on the basis of the mean severity of problems in that sub-category, and six sub-categories were identified on the basis of the frequency of problem in that sub-category. This information is also summarized in Table 6. From these analyses, we have proposed a set of heuristics for the development and evaluation of highly interactive websites that are evidence-based, using both the severity and frequency of problems encountered by users. Unfortunately, this yields a rather lengthy set of 21 heuristics (covering 23 sub-categories of our emergent categorization), but grouped into the four major categories. Physical presentation has four heuristics, Content three heuristics, Information Architecture only one heuristic and Interactivity 13 heuristics. We made a comparison of these new heuristics with Molich and Nielsen’s heuristics [17, 18, 20], probably the best known and most widely used of the available heuristic sets. 2113 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA HEURISTIC PHYSICAL PRESENTATION 1. Make text and interactive elements large and clear enough Default and typically rendered sizes of text and interactive elements should be large enough to be easy to read and manipulate. 2. Make page layout clear Make sure that the layout of information on the page is clear, easy to read and reflects the organization of the material. 3. Avoid short time-outs and display times Provide time-outs that are long enough for users to complete the task comfortably, and if information is displayed for a limited time, make sure it is long enough for users to read comfortably. 4. Make key content and elements and changes to them salient Make sure the key content and interactive elements are clearly visible on the page and that changes to the page are clearly indicated. CONTENT 5. Provide relevant and appropriate content Ensure that content is relevant to users’ task and that it is appropriately and respectfully worded. 6. Provide sufficient but not excessive content Provide sufficient content (including Help) so that user can complete their task but not excessive amounts of content that they are overwhelmed. 7. Provide clear terms, abbreviations, avoid jargon Define all complex terms, jargon and explain abbreviations. INFORMATION ARCHITECTURE 8. Provide clear, well-organized information structures Provide clear information structures that organize the content on the page and help users complete their task. INTERACTIVITY 9. How and why Provide users with clear explanations of how the interactivity works and why things are happening. 10. Clear labels and instructions Provide clear labels and instructions for all interactive elements. Follow web conventions for labels and instructions (e.g. use of asterisk for mandatory elements). 11. Avoid duplication/excessive effort by users Do not ask users to provide the same information more than once and do not ask for excessive effort when this could be achieved more efficiently by the system. 12. Make input formats clear and easy Make clear in advance what format of information is required from users. Use input formats that are easy for users, such as words for months rather than numbers. 13. Provide feedback on user actions and system progress Provide feedback to users on their actions and if a system process will take time, on its progress. 14. Make the sequence of interaction logical Make the sequence of interaction logical for users (e.g. users who are native speakers of European languages typically work down a page from top left to bottom right, so provide the Next button at the bottom right). 15. Provide a logical and complete set of options Ensure that any set of options includes all the options users might need and that the set of options will be logical to users. 2114 Rationale for inclusion Rationale: Frequency of problem for users Frequency: 18 times Rationale: Frequency Frequency: 18 times Rationale: Severity of problem for users Median severity rating = 2.25 Rationale: Frequency Frequency: 14 times Rationale: Frequency and severity Median severity rating = 2.0 Frequency: 21 times Rationale: Frequency and severity Median severity rating (sufficient) = 2.47 Frequency: 47 times Median severity rating (excessive) = 2.0 Rationale: Frequency and severity Median severity rating = 2.0 Frequency: 19 times Rationale: Frequency and severity Median severity rating = 2.0 Frequency: 30 times Rationale: Frequency and severity Median severity rating = 2.2 Frequency: 51 times Rationale: Frequency and severity Median severity rating = 2.0 Frequency: 48 times Rationale: Frequency and severity Median severity rating = 2.0 Frequency: 14 times Rationale: Frequency Frequency: 18 times Rationale: Severity Median severity rating = 2.0 Rationale: Frequency and severity Median severity rating = 2.0 Frequency: 11 times Rationale: Frequency and severity Median severity rating = 2.0 Frequency: 57 times Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA 16. Follow conventions for interaction Rationale: Frequency and severity Unless there is a very particular reason not to, follow web and logical conventions in Median severity rating = 2.0 the interaction (e.g. follow a logical tab order between interactive elements). Frequency: 46 times 17. Provide the interactive functionality users will need and expect Rationale: Frequency and severity Provide all the interactive functionality that users will need to complete their task Median severity rating = 2.0 and that they would expect in the situation (e.g. is a search needed or provided?). Frequency: 73 times 18. Indicate if links go to an external site or to another webpage Rationale: Severity If a link goes to another website or opens a different type of resource (e.g. PDF Median severity rating = 2.0 document) indicate this in advance. 19. Interactive and non-interactive elements should be clearly distinguished Rationale: Frequency Elements which are interactive should be clearly indicated as such, and element Frequency: 14 times which are not interactive should not look interactive. 20. Group interactive elements clearly and logically Rationale: Frequency Group interactive elements and the labels and text associated with them in ways that Frequency: 15 times make their functions clear. 21. Provide informative error messages and error recovery Rationale: Frequency and severity Provide error messages that explain the problem in the users’ language and ways to Median severity rating = 2.0 recover from errors. Frequency: 15 times Table 6. New evidence-based heuristics for designing and evaluating highly interactive websites. . 9 of the 21 (42.9%) new heuristics do not feature in those heuristics. These are new heuristics #1, #3, #5, #6, #15, #17, #18, #19 and #20. Further, five of the new heuristics share aspects from more than one Molich and Nielsen heuristic. These are heuristics #4, #9, #10, #11 and #12. Finally, seven of the new heuristics map onto one Molich and Nielsen heuristic; however, this mapping is not one-toone. A number of the new heuristics share a Molich and Nielsen heuristic in common. For example, the Molich and Nielsen heuristic “Match between system and real world” specifically says that designers should “Follow real-world conventions, making information appear in a natural and logical order”. The new heuristics “Make Page Layout Clear” and “Make sequences of action logical” are two heuristics that are more precise, addressing different aspects of the Molich and Nielsen heuristic as it applies to current websites. aspects of both Molich and Nielsen “Aesthetic and minimalist design” and “Recognition rather than recall”, but it is not clear that evaluators recognize the kinds of problems grouped under the new heuristic as being exemplars of the two original Molich and Nielsen heuristics. Further work is needed to establish whether these heuristics are indeed more effective in the development and evaluation of websites, beyond the fact that they are developed from a large corpus of problems. Our own future work will involve using the new set of heuristics in expert evaluation of a further set of highly interactive websites and comparing the results with user evaluation of the same websites. We would predict that using the new heuristics should guide evaluators to the problems that users encounter and yield a higher overlap in problems between user and expert evaluation, thus improving the effectiveness of the expert evaluation. However, it is important that independent researchers also conduct evaluations using these heuristics, in both development and evaluation contexts, and we welcome such studies. This demonstrates that the new heuristics are both different in coverage from Molich and Nielsen’s and different in their organization. This is not a criticism of Molich and Nielsen’s work, but reflects the very different nature of current highly interactive websites in comparison to the interfaces of the 1980s from which Molich and Nielsen drew their heuristics. Indeed, the overlap with Molich and Nielsen’s heuristics may actually be less obvious for a practicing web developer or evaluator. We found that when working with our large corpus of problems we could map backwards from current problems to Molich and Nielsen’s heuristics. However, without this corpus, when we have been conducting evaluations of individual websites, it has been very difficult to map forwards from the Molich and Nielsen heuristics to problems encountered with current websites. For example, the new heuristic “Avoid duplication/excessive effort by users” is equivalent to If our evaluation study is successful, further work will explore the generalizability of the heuristics to more diverse websites and other interactive systems. An important question to address is whether it possible to have a general set of heuristics to capture the main usability problems of all interactive systems, or is it the case that the scope of interactive systems is now so broad, that different heuristics are needed for different categories of interactive system? Certainly, the new set of heuristics is already large, and adding more heuristics to cope with a wider range of interactive systems may defeat the purpose of a set of heuristics that is relatively easy to remember and use. 2115 Session: Usability Methods CHI 2012, May 5–10, 2012, Austin, Texas, USA In conclusion, we believe that an evidence-based set of heuristics for highly interactive websites is a useful tool to both developers producing websites and experts evaluating them. In particular, use of these heuristics should improve the effectiveness of expert evaluation of websites. 11. Instone, K. (1997). Site usability heuristics for the web. Web Review, October. Available at: http://instone.org/heuristics 12. Jeffries, R., Miller, J.R., Wharton, C. and Uyeda, K. (1991). User interface evaluation in the real world: a comparison of four techniques. Proc. CHI 1991, ACM Press (1991), 119-124. ACKNOWLEDGMENTS The authors thank Lucy Buykx, Andre Freire, John Precious and David Swallow for their assistance with collecting and coding of the data for this paper. They also thank all the participants and experts who participated and DirectGov (www.direct.gov.uk) for their funding of the work. 13. Koutsabasis, P., Spyrou, T. and Darzentas, J. (2009). Evaluating usability evaluation methods: criteria, method and a case study. In J. Jacko (Ed.), HumanComputer Interaction, Part I, HCII 2007 (LNCS 4550). Berlin: Springer Verlag. 14. Lazar, J., Feng, J.H. and Hochheiser, H. (2010). Research methods in human-computer interaction. Chichester, UK: Wiley. REFERENCES 1. Batra, S. and Bishu, R.R. (2007). Web usability and evaluation: issues and concerns. In N. Aykin (Ed.), Usability and Internationalization, Part I, HCII 2007 (LNCS 4559). Berlin: Springer-Verlag. 2. Bryant, T. and Charmaz, K. (2007). The Sage Handbook of Grounded Theory. London: Sage. 15. Lewis, C., Polson, P., Wharton, C., and Rieman, J. (1990). Testing a walkthrough methodology for theorybased design of walk-up-and-use interfaces. Proc. CHI 1990, ACM Press (1990), 235-242. 3. Budd, A. (2007). Heuristics for modern web application development. Blogography, January 17. Available at: www.andybudd.com/archives/2007/01/heuristics_for_m odern_web_application_development/ 16. Lewis, C. and Wharton, C. (1997). Cognitive walkthroughs. In M. Helander, T. K. Landauer and P. Prabhu (Eds.), Handbook of human–computer interaction (2nd edition). Amsterdam: Elsevier. 4. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46. 17. Molich, R. and Nielsen, J. (1990). Improving a humancomputer dialogue. Communications of the ACM, 33(3), 338 – 348. 5. Cockton, G. and Woolrych, A. (2001). Understanding inspection methods: lessons from an assessment of heuristic evaluation. In A. Blandford, J. Vanderdonckt and P.D. Gray (Eds.), People and Computers XV. Berlin: Springer Verlag. 18. Nielsen, J. (1993). Usability engineering. San Diego, CA: Morgan Kaufmann. 6. Dix, A., Finlay, J., Abowd, G.D. and Beale, R. (2004). Human-computer interaction (3rd edition). Harlow, UK: Pearson Prentice Hall. 20. Nielsen, J. and Molich, R. (1990). Heuristic evaluation of user interfaces. Proc. CHI 1990, ACM Press (1990), 249-256. 7. Gray, W.D. and Salzman, M. (1998). Damaged merchandise? A review of experiments that compare usability evaluation methods. Human Computer Interaction, 13(3), 203 – 261. 21. Petrie, H. and Buykx, L. (2010). Collaborative Heuristic Evaluation: improving the effectiveness of heuristic evaluation. Proceedings of UPA 2010 International Conference. Omnipress. Available at: http://upa.omnibooksonline.com/index.htm 19. Nielsen, J. (1994). Enhancing the explanatory power of usability heuristics. Proc. CHI 1994, ACM Press (1994), 152-158. 8. Hertzum, M. and Jacobsen, N.E. (2001). The evaluator effect: a chilling fact about usability evaluation methods. International Journal of Human-Computer Interaction, 13(4), 421 – 443. 22. Rogers, Y., Sharp, H. and Preece, J. (2011). Interaction design: beyond human-computer interaction. Chichester, UK: Wiley. 9. Hertzum, M., Jacobsen, N.E. and Molich, R. (2002). Usability inspections by groups of specialists: perceived agreement in spite of disparate observations. In Ext. Abstracts CHI 2002, ACM Press (2002), 662-663. 23. Shneiderman, B. and Plaisant, C. (2005). Designing the user interface (4th edition). Boston, MA: Addison Wesley. 24. Strauss, A. and Corbin, J. (1997). Grounded theory in practice. London: Sage. 10. Hornbaek, K. (2011). Dogmas in the assessment of usability evaluation methods. Behavior and Information Technology, 29(1), 97 – 111. 25. Tognazzini, B. (2003). First principles of interaction design. Available at: http://asktog.com/basics/firstPrinciples.htm 2116