Structuring Dimensions for Collaborative Systems Evaluation
PEDRO ANTUNES, University of Lisbon
VALERIA HERSKOVIC, SERGIO F. OCHOA, and JOSE A. PINO, University of Chile, Santiago
Collaborative systems evaluation is always necessary to determine the impact a solution will have on
the individuals, groups, and the organization. Several methods of evaluation have been proposed. These
methods comprise a variety of approaches with various goals. Thus, the need for a strategy to select the
most appropriate method for a specific case is clear. This research work presents a detailed framework to
evaluate collaborative systems according to given variables and performance levels. The proposal assumes
that evaluation is an evolving process during the system lifecycle. Therefore, the framework, illustrated with
two examples, is complemented with a collection of guidelines to evaluate collaborative systems according
to product development status.
Categories and Subject Descriptors: H.5.3 [Information Interfaces and Presentation]: Group and Organization Interfaces—Evaluation/methodology, Theory and models; K.6.1 [Management of Computing
and Information Systems]: Project and People Management—Life cycle, Systems development
General Terms: Measurement, Human Factors, Management
Additional Key Words and Phrases: Collaborative systems evaluation, human-computer interaction, interaction assessment, evaluation dimensions, evaluation guidelines
ACM Reference Format:
Antunes, P., Herskovic, V., Ochoa, S. F., and Pino, J. A. 2012. Structuring dimensions for collaborative systems
evaluation. ACM Comput. Surv. 44, 2, Article 8 (February 2012), 28 pages.
DOI = 10.1145/2089125.2089128 http://doi.acm.org/10.1145/2089125.2089128
1. INTRODUCTION
The evaluation of collaborative systems is an important issue in the field of Computer
Supported Cooperative Work (CSCW). Appropriate evaluation justifies investments,
appraises stakeholders’ satisfaction, or redirects systems development to successful
requirements matching. Several specific evaluation methods have been proposed [Herskovic et al. 2007] beyond those intended for Information Systems in general. However,
many collaborative systems seem to be poorly evaluated. A study of 45 articles from
eight years of the CSCW conference revealed that almost one third of the presented
collaborative systems were not evaluated in a formal way [Pinelle and Gutwin 2000].
Even when evaluations are done, many of them seem to be performed in an ad hoc way,
depending on the researchers’ interests or the practical adequateness for a specific setting [Inkpen et al. 2004; Greenberg and Buxton 2008]. This shows a need for a strategy
that helps choose suitable collaborative systems evaluation methods.
This article was partially supported by the Portuguese Foundation for Science and Technology
(PTDC/EIA/102875/2008), Conicyt PhD scholarship, Fondecyt (Chile) Grants No. 11060467 and 1080352,
and LACCIR Project No. R0308LAC004.
Authors’ addresses: P. Antunes, Department of Informatics, University of Lisbon; email:
[email protected];
V. Herskovic, S.F. Ochoa, and J. A. Pino, Computer Science Department, University of Chile; emails:
{vherskov, sochoa, jpino}@dcc.uchile.cl.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or
[email protected].
c 2012 ACM 0360-0300/2012/02-ART8 $10.00
DOI 10.1145/2089125.2089128 http://doi.acm.org/10.1145/2089125.2089128
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8
8:2
P. Antunes et al.
This article proposes a framework to evaluate a collaborative system under development or procurement, as well as a set of guidelines to select the appropriate evaluation
techniques. We understand evaluation as an evolving process that is in some way
associated with the conception, design, construction, and deployment activities of a
system development. The guidelines also address the case of a collaborative system
being purchased by an organization.
We consider two main structuring dimensions in order to frame the various contingencies of the evaluation process. One of these dimensions defines the set of relevant
evaluation variables, and the other concerns the levels of human performance under
evaluation. The considered evaluation variables are realism, generalization, precision,
system detail, system scope, and invested time. The adopted levels of human performance consider role-based, rule-based, and knowledge-based tasks. The approach is
generally applicable to all types of collaborative systems.
Section 2 analyzes the major problems associated with collaborative systems evaluation. Section 3 discusses the related work. In particular, it describes, categorizes, and
compares several well-known evaluation methods. Section 4 describes the proposed
framework for evaluation. Section 5 presents the collection of guidelines for evaluation. Section 6 contains two case studies of collaborative systems evaluation. Finally,
Section 7 presents the conclusions and further work.
2. STUDYING COLLABORATIVE SYSTEMS EVALUATION
2.1. Why Is Collaborative Systems Evaluation So Difficult?
The success of a collaborative system depends on multiple factors, including group characteristics and dynamics, social and organizational context in which it is inserted, and
positive and negative effects of technology on the group’s tasks and processes. Therefore, evaluation should attempt to measure several effects on multiple, interdependent
stakeholders and in various domains. What distinguishes collaborative systems from
other information systems is indeed the need to evaluate its impact with an eclectic
approach.
Ideally, a single collaborative systems evaluation method should cover the individual,
group, and organizational domains, assessing whether or not the system is successful
at the combination of those realms. Unfortunately, no such single method is currently
available, and may never be. The fundamental cause for it is related with the granularity and time scale of the information obtained at these three domains [Newell 1990].
—The information pertaining to the individual is usually gathered at the cognitive
level, focusing on events occurring on a timeframe in the order of a few minutes or
even seconds.
—Group information is gathered at the interaction/communication level, addressing
activities occurring in the range of several minutes and hours.
—The information regarding organizational impact concerns much longer timeframes,
usually in the order of days, months, and even years.
Moreover, the results of an evaluation should be weighted by the degree of certainty
in them, which depends on the maturity of what is being evaluated. At the inception
phase, the product to be evaluated may be just a concept or a collection of design ideas,
so the results have a high degree of uncertainty. When the development reaches full
deployment, the product may then be tested in much more far-reaching and systematic
ways, providing evaluators with an increased degree of certainty and relatively precise
results.
The dependence between product development and evaluation is noticeable in the
star model illustrated in Figure 1 [Hix and Hartson 1993]. Evaluation is a central
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:3
Fig. 1. The dependence between product development and evaluation.
aspect of a broad collection of activities aiming to develop a product, but it has to
compete with the other activities for attention, relevance, and critical resources, such
as people, time and money.
2.2. Why Evaluate and How to Evaluate?
McGrath [1984] characterized the purpose of conducting an evaluation as addressing
three main goals: precision, generalizability, and realism. The first goal concerns the
precision of the data obtained by the instrument being used. This goal is inherently
linked with the capability to control the dependent and independent variables, the subjects, and the experiment. Laboratory experiments are usually selected to accomplish
this high level of control.
Generalizability concerns the extent to which the obtained results may be applied
to a population. High-set goals on generalizability usually imply adopting large-scale
inquiries and surveys, while low generalizability is obtained by interviewing a small
audience.
Realism addresses how closely the obtained results represent real-world conditions
by considering the work setting, the population of users, and the tasks, stimulus, time
stress, and absence of observers, etc. Laboratory experiments have been criticized for
providing low realism, especially with collaborative systems, whereas field studies have
been considered to score high on realism but low on precision.
Overall, the ideal evaluation should maximize the three goals; for instance, using
multiple evaluation methods and triangulating the obtained results. Nevertheless,
McGrath [1984] states this sort of evaluation would be very costly and difficult to carry
out, which ultimately may have to be considered utopian. McGrath then identifies the
major compromising strategies adopted to overcome the costs of an ideal evaluation:
—field strategies to make direct observations of realistic work.
—experimental strategies based on artificial experimental settings to study specific
activities with high precision.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:4
P. Antunes et al.
—respondent strategies to obtain evidence by sampling a large and representative
population.
—theoretical strategies that use theory to identify the specific variables of interest.
2.3. What to Evaluate?
Pinsonneault and Kraemer [1989] defined one of the pioneering collaborative systems
evaluation frameworks addressing the practical aspects related to the exact object
under evaluation. The framework adopts an input-process-output view to conceptualize
the relationship between the technology support and other factors related to the group,
group behavior, and work context.
—Contextual variables are the important factors in the group behavior. Contextual
variables belong to five major categories: personal, situational, group structure, task
characteristics, and technology characteristics (e.g., anonymity and type of communication).
—Group process, as defined by the characteristics of the group interaction, including
decisional characteristics, communicational characteristics, and interpersonal characteristics.
—Outcomes of the group process as affected by the technology support, including taskrelated outcomes and group-related outcomes.
This framework has been highly influential, especially because it created a common
foundation for comparing multiple collaborative systems experiments. Also, the distinction between group process and outcomes highlights two quite different evaluation
dimensions commonly found in the literature, the former usually addressing questions
of meaning (e.g., ethnography [Hughes et al. 1994] and groupware walkthrough [Pinelle
and Gutwin 2002]), and the latter addressing questions of cause and effect (e.g., value
creation [Briggs et al. 2004]). Other collaborative systems evaluation frameworks, such
as the ones proposed by Hollingshead and McGrath [1995] and Fjermestad and Hiltz
[1999], are based on this framework.
Regarding more recent evaluation frameworks, Neale et al. [2004] proposed a simplified evaluation framework, basically consisting of two categories. One encompasses
the contextual variables already mentioned. The other category concerns the level of
work coupling attained by the work group, which combines technology characteristics
with group process characteristics. Along with this proposition, Neale et al. [2004] also
recommend blending the different types of evaluation. Araujo et al. [2002] also proposed a simplified framework based on four dimensions: group context (which seems
consensual in every framework), system usability, level of collaboration (similar to the
level of work coupling), and cultural impact. Cultural impact is seen as influencing the
other dimensions, thus introducing a feedback loop in the input-process-output view.
2.4. When to Evaluate?
The timing of the evaluation is inherently associated with the development process.
It is common to distinguish between the preliminary and final development stages
[DeSanctis et al. 1994; Guy 2005]. The preliminary stage affords what has been designated formative evaluation [Scriven 1967], which mainly serves to provide feedback
to the designers about the viability of design ideas, usability problems, perceived satisfaction with the technology, possible focal points for innovation, and alternative solutions, and also feedback about the development process itself. The final stage, which
sometimes is designated summative evaluation, provides complete and definitive information about the developed product and its impact on the users, the group, and the
organization.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:5
3. RELATED WORK
This section begins by describing the way we built a relevant corpus of papers to
be analyzed, and also the literature review method used to classify the evaluation
strategies. These retrieved strategies were split into two subsets: 1) evaluation methods
that are presented in Section 3.2, and 2) evaluation frameworks that are described in
Section 3.3.
3.1. Literature Review Methodology
We began our search for articles in the literature concerning collaborative systems
evaluation by exploring various ways to get a large initial corpus of papers. The main
technique to obtain papers was to search using pertinent search engines, such as,
Google Scholar and the ACM Digital Library, through combinations of keywords containing terms related with CSCW and evaluation (e.g., groupware evaluation, collaborative systems assessment, etc.). The proceedings from several relevant conferences and workshops in the area (e.g., CSCW, ECSCW, CHI, WETICE) were reviewed
to find additional papers to add to the corpus. Then, we examined references of already found relevant papers, and searched through Google Scholar for papers citing those we had found. Each paper was carefully reviewed in order to determine
if it had merit to be part of the set of preselected articles. The large set thus built
was reduced by filtering out papers that did not present a distinctive evaluation
proposal.
The initial analysis of the corpus of papers identified several types of proposals
to evaluate collaborative systems. Some papers presented ad hoc techniques or tools
(e.g., questionnaires) defined specifically to evaluate a particular application. Such
papers were not considered in our analysis because we were interested in finding
evaluation methods with a clear and reusable evaluation strategy. Papers reporting just
evaluation tools (i.e., single instruments intended to measure system variables) were
also removed from the main corpus when they did not include an evaluation process.
Once we filtered out the tools and nonreusable evaluation proposals, we analyzed
the remaining contributions and realized those proposals could be classified in two
categories: evaluation methods and evaluation frameworks.
We define evaluation methods as procedures used to apply evaluation tools with a
specific goal. For example, the Perceived Value evaluation method [Antunes and Costa
2003] uses evaluation tools such as questionnaires and checklists with the goal of determining the organizational impact of meetingware. We define evaluation frameworks as
macro-strategies used to organize the evaluation process. Several evaluation methods
and tools may be included in an evaluation framework.
After classifying the articles into these two categories, the analysis of the contributions was focused on the evaluation methods category. Each subset was then expanded
to include seminal evaluation methods that have been adapted to the collaboration
context.
The careful analysis of these selected papers led us to define a set of relevant inquiries that can be applied to each method in order to classify them more properly:
(1) purpose of evaluation (why), (2) evaluation tools being used (how), (3) outcomes of
the evaluation (what), and (4) moment in which the evaluation is conducted (when).
Section 3.2 presents this classification, which is complemented with a narrative summary of the procedures adopted by each method.
Moreover, we classified the evaluation methods by publication date, which served
to build an understanding of their emergence and subsequent life. This classification
allowed us to construct the timeline presented in Appendix A. The timeline analysis
shows some identifiable patterns.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:6
P. Antunes et al.
Table I. Characterization of Evaluation Methods
Method
Why
GHE
Precision
When
Summative
GWA
Formative
CUA
GOT
HPM
QDE
PAN
PVA
SBE
COS
TTM
KMA
How
What
Software Analysis, checklist Effectiveness, efficiency,
satisfaction
Precision
Software Analysis
Effectiveness, efficiency,
satisfaction
Precision
Software Analysis
Effectiveness, efficiency,
satisfaction
Realism
Observation, checklist
Effectiveness, efficiency,
satisfaction
Precision
Interaction Analysis
Group performance
Realism
Observation
Redesign
Generalizability
Formal analysis
Efficiency
Realism
Questionnaire, checklist
Organizational Impact
Realism/Precision Interviews
Organizational
Contributions
Realism
Interviews, observation
Redesign
Generalizability
Interviews, observation
Predicted actual use
Generalizability
Software analysis, checklist Knowledge circulation
Formative
Summative
Formative
Summative
Formative
Formative
Formative
Formative
Formative
Formative
—The adaptation of single-user evaluation methods, developed in the HumanComputer Interaction field, to the specific context of collaborative systems. This
has occurred, for instance, with walkthroughs (structured walkthroughs, cognitive
walkthroughs, groupware walkthroughs), heuristic evaluation (heuristic evaluation,
heuristic evaluation based on the mechanics of collaboration) and scenario-based
evaluation.
—The assimilation of perspectives, methods, and techniques from other fields beyond
technology development. The clearest example is ethnography (observational studies,
quick-and-dirty ethnography, workplace studies), but cognitive sciences also seem to
have impact (KLM, cognitive walkthroughs, computational GOMS).
—The increasing complexity of the evaluation context. Most early methods (e.g., structured walkthroughs, KLM, discount methods) seem to focus on very specific variables measured under controlled conditions, while some of the latter methods seem
to consider broader contextual issues (e.g., multi-faceted evaluation, perceived value,
evaluating collaboration in co-located environments, lifecycle based approach).
Finally we also analyzed the proposals concerning evaluation frameworks. Section 3.3 presents the most representative ones.
3.2. Sample of Evaluation Methods
This section presents a sample of collaborative systems evaluation methods. Table 1
presents a summarized characterization of the selected evaluation methods, describing
the purpose of the evaluation (why), the evaluation tools being used in each method
(how), the outcomes of the evaluation (what), and the moment in which evaluation is
conducted (when). Then, we present a brief description of the steps involved in each
evaluation method.
Groupware Heuristic Evaluation (GHE) [Baker et al. 2002]. GHE is based on eight
groupware heuristics, which act as a checklist of characteristics a collaborative system
should have. Evaluators who are experts in them examine the interface, recording
each problem they encounter, the violated heuristic, a severity rating, and optionally,
a solution to the problem. The problems are then filtered, classified, and consolidated
into a list, which is used to improve the application.
Groupware Walkthrough (GWA) [Pinelle and Gutwin 2002]. A scenario is a description of an activity or set of tasks, which includes the users, their knowledge, the
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:7
intended outcome, and circumstances surrounding it. Evaluators construct scenarios
by observing users and identifying episodes of collaboration. Each evaluator, taking the
role of all users or one in particular, walks through the tasks in a laboratory setting,
recording each problem he encounters. A meeting is then conducted to analyze the
results of the evaluation.
Collaboration Usability Analysis (CUA) [Pinelle et al. 2003]. Evaluators map collaborative actions to a set of collaboration mechanisms, or fine-grained representations of
basic collaborative actions, which may be related with elements in the user interface.
The resulting diagrams capture details about task components, a notion of the flow
through them, and the task distribution.
Groupware Observational User Testing (GOT) [Gutwin and Greenberg 2000]. GOT
involves evaluators observing how users perform particular tasks supported by a system in a laboratory setting. Evaluators either monitor users having problems with a
task, or ask users to think aloud about what they are doing to gain insight into their
work. Evaluators focus on collaboration and analyze users’ work through predefined
criteria, such as the mechanics of collaboration.
Human-Performance Models (HPM) [Antunes et al. 2006]. Evaluators first decompose the physical interface into several shared workspaces. Then, they define critical
scenarios focused on the collaborative actions for the shared workspaces. Finally, evaluators compare group performance in the critical scenarios to predict execution times.
“Quick-and-dirty” Ethnography (QDE) [Hughes et al. 1994]. Evaluators do brief
ethnographic workplace studies to provide a general sense of the setting for designers.
QDE suggests the deficiencies of a system, supplying designers with the key issues
that bear on acceptability and usability, thus allowing existing and future systems to
be improved.
Performance Analysis (PAN) [Baeza-Yates and Pino 1997; Baeza-Yates and Pino
2006]. The application to be studied is modeled as a task to be performed by a number
of people in a number of stages, and the concepts of result quality, time, and total
amount of work done are defined. The evaluators must define a way to compute the
quality (e.g., group recall in a collaborative retrieval task), and maximize the quality
vs. work done, either analytically or experimentally.
Perceived Value (PVA) [Antunes and Costa 2003]. PVA begins by developers identifying relevant components for system evaluation. Then, users and developers negotiate
the relevant system attributes to be evaluated by users. After the users have worked
with the system, they fill out an evaluation map by noting whether the components
support the attributes or not. Using these ratings, a metric representing the PV is
calculated.
Scenario-Based Evaluation (SBE) [Haynes et al. 2004]. SBE uses field evaluation.
Evaluators perform semi-structured interviews with users to discover scenarios, or
detailed descriptions of activities, and claims about them. Then, focus groups validate
these findings. The frequency and percentage of positive claims help quantify the
organizational contributions of the system, and the positive and negative claims about
existing and envisioned features provide information to aid in redesign.
Cooperation Scenarios (COS) [Stiemerling and Cremers 1998]. Evaluators conduct
field studies, semi-structured interviews, and workplace visits. They thus identify scenarios, cooperative behavior, users involved and their roles, and the relevant context. For each role involved in the cooperative activity, evaluators analyze the new
design to see how the task changes and who benefits from the new technology. Then,
the prototype is presented as a scenario in a workshop with users to discover design
flaws.
Knowledge Management Approach (KMA) [Vizcaı́no et al. 2005]. Evaluation using KMA measures whether the system helps users detect knowledge flows and
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:8
P. Antunes et al.
disseminate, store and reuse knowledge. The knowledge circulation process is comprised of six phases (knowledge creation, accumulation, sharing, utilization, internalization), which are also the areas evaluated by this approach. The evaluation is
performed by answering questions associated with each area.
Technology Transition Model (TTM) [Briggs et al. 1998]. TTM predicts the actual
system use as a function of the intent to use the system, the value that users attribute
to it, how frequently it will be used, and the perceived cost of transition. This model
proposes that users weigh all factors affecting the perceived value of a system, producing an overall value corresponding to their perception of the usefulness of the system.
Users’ opinions are obtained by interviews, archival analysis, and observations. These
opinions are the basis to predict actual use of the system. The collaborative application
can thus be evaluated to increase the speed of its acceptance, while reducing the risk
of technology transition.
3.3. Evaluation Frameworks
This section presents existing macro-strategies to performing evaluation. Several
frameworks adopt an input-process-output view [Pinsonneault and Kraemer 1989;
Ross et al. 1995; Damianos et al. 1999; Araujo et al. 2002; Huang 2005], while others
include evaluation in the software development cycle [Hix and Hartson 1993; Baecker
et al. 1995; Veld et al. 2003; Huang 2005].
The star model [Hix and Hartson 1993] proposes evaluation as the central phase in
the software development cycle. This means evaluation should be conducted after every
development step. Baecker et al. [1995] regard development as an iterative process
of design, implementation, and evolution, and apply appropriate evaluation methods
after each development phase. The concept design is evaluated through interviews,
the functional design through usability tests, the prototype through heuristics, the
delivered system through usability tests, and finally, the system evolution is evaluated
through interviews and questionnaires.
Huang [2005] proposes a lifecycle strategy. An evaluation plan is defined before starting development, considering five domains: context, content, process, stakeholders, and
success factors. The plan is improved at each cycle after analyzing the evaluation results. The E-MAGINE framework [Veld et al. 2003] has a similar structure: first, a
meeting and an interview are done to establish the evaluation goals and group profile.
This information guides the selection of evaluation methods and tools which will be
used.
Damianos et al. [1999] present a framework based on Pinsonneault and Kraemer’s
proposal [1989]. The framework has four levels: requirement, capability, service, and
technology. Appropriate methods should be selected at each level to conduct the evaluation. At the requirement level, evaluation concerns the overall system quality. At
the capability level, evaluation addresses the system capabilities. At the service level,
evaluation is focused on performance and cost. Finally, the technology level concerns
benchmarking technical issues.
The PETRA strategy combines the perspective of the evaluator and the perspective
of the users, or participants [Ross et al. 1995]. In this way, it aims to achieve a balance
between theoretical and practical methods.
The CSCW Lab proposes four dimensions to consider when evaluating collaborative systems: group context, usability, collaboration, and cultural impact [Araujo
et al. 2002]. Each dimension is a step of the evaluation process, which consists of
characterizing the group and work context, measuring usability strengths and weaknesses and collaboration capabilities, and studying the impact of the application over
time.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:9
Fig. 2. Variables adopted for the evaluation framework.
4. COLLABORATIVE SYSTEMS EVALUATION FRAMEWORK
4.1. Variables
Section 2.1 introduced the need to choose variables to assess a collaborative system
under development. We should, then, characterize our framework according to a set of
variables providing insights on the evaluation methods to be applied.
A starting point is McGrath’s evaluation goals mentioned in Section 2.2. These goals
are fundamental to laying out the evaluation methodology. For our evaluation framework, it seems thus appropriate to choose variables associated to these goals; if the
evaluation methods change in succeeding evaluations, these variables will reflect the
new evaluation methodology, illustrated in Figure 2. Precision, generalization, and realism are our first three variables to describe the evaluation method. Precision focuses
on the accuracy of the measuring tools, generalization concerns the extent (in terms of
population) to which the method must be applied, and realism refers to whether the
evaluation will use real settings or not.
It is important to incorporate the level of system detail (depth) as one of the dimensions characterizing the evaluation activities. This dimension concerns the granularity
of the evaluation. Evaluation methods with a high level of system detail (e.g., mouse
movements of a user) will provide more specific and accurate information to improve
the system under review.
Another dimension we would like to incorporate in the evaluation framework is the
scope (breadth) of the system being evaluated. An evaluation having a large value
for this variable would mean the system being evaluated has many functionalities
and components being assessed. This variable complements the detail dimension. The
breadth dimension can help identify the scope of a system that could be covered with
a particular evaluation method. We note that while the first three variables in our
framework consider theoretical issues, the system detail and scope concern the product
development state.
Finally, an invested time variable describes the time used by the evaluators to carry
out the work. This variable may not be completely independent from other variables,
notably, detail and scope (since, e.g., a coarse-grain evaluation narrowing to a few functionalities will probably require little invested time). However, from a more practical
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:10
P. Antunes et al.
standpoint, it is an important variable to distinguish the efficient evaluation methods
from those which are not. Therefore, invested time is included in the framework.
Other variables could be considered for our framework; however after analyzing
several of these variables—such as evaluation cost/effort, feedback richness, or required
expertise—we realized they could be inferred in some way by relating the results of
the proposed dimensions. Moreover, the selected variables seem adequate to analyze
evaluation methods, as shown in following sections.
Figure 2 shows a radar-graph representation of the evaluation variables. A specific
method is represented by a dot in each of the axes (variables). Each axis has a scale
from 0 (or minimum value) in the origin to a certain maximum value. These dots may
be joined to show a certain evaluation shape. It may be noticed a numeric value for
the area within a shape does not make much sense, since the scales are not the same
for each variable. However, a light evaluation procedure will probably have low values
for most or all variables, whereas a heavy one will probably score high in several
evaluation variables.
4.2. Performance Levels
Reason [2008] proposed a three-layered model of human performance in organizational
contexts by extending a proposal by Rasmussen and Jensen [1974]. We will apply this
model to the specific context of collaborative systems evaluation.
The model categorizes human performance according to two dimensions: situation
and situation control. According to the situation dimension, the organizational activities may be classified as (1) routine, when the activities are well known by the
performers and accomplished in an almost unconscious way; (2) planned, when the activities have been previously analyzed by the organization and thus there are available
plans and procedures to guide the performers accomplishing the intended goals; and
(3) novel, when the way to achieve the intended goals is unknown to the organization
and thus human performance must include problem analysis and decision-making activities. Shared workspaces, workflow systems, and group support systems are good
examples of collaborative systems technology supporting the routine, planned, and
novel dimensions.
The other dimension concerns the level of control the performers may exert while
accomplishing the set goals. The control may be mechanical, when a human action is
performed according to a predefined sequence imposed by the technology. The control
may be human, when the technology does not impose any predefined action sequence.
Finally the control may be mixed, when it opportunistically flows between humans
and the technology. These two dimensions serve to lay down the following performance
levels as illustrated in Figure 3.
—Role-based performance encompass routine tasks performed with mechanical control
at the individual level. Any group activity at this level is basically considered as a
collection of independent activities.
—Rule-based performance concerns tasks accomplished with some latitude of decision
from humans but within the constraints of a specific plan imposed by technology. Unlike the previous level, the group activities are perceived as a collection of coordinated
activities.
—Knowledge-based performance concerns interdependent tasks performed by humans
in the scope of group and organizational goals.
This model highlights the increasing sophistication of human activity, in which
simple (from the perspective of the organization) individual roles are complemented
with more complex coordinated activities and supplemented by even more complex
knowledge-based and information-rich activities. The group becomes more important
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:11
Fig. 3. Performance levels, adapted from Reason [2008].
than the individual. We will use this model to delineate three distinct collaborative
systems evaluation scenarios.
4.3. Evaluation Scenarios
Our evaluation scenarios follow the three-layer view previously mentioned.
Role-based scenario. The evaluation data is gathered at the individuals’ cognitive
level, focusing on events occurring during a timeframe in the order of minutes or
even seconds. The most adequate evaluation methods to employ in this scenario adopt
laboratory settings and considerable instrumentation (e.g., key logging). To gather the
data, the evaluators must accurately specify the roles and activities, and the subjects
must exactly act according to the instructions, under strict mechanical control. In
these circumstances the system detail is high (e.g., keystrokes and mouse movements)
but the system scope is low (e.g., roles associated to some particular functions). This
scenario also trades off realism towards higher precision and generalizability. The time
invested in this type of evaluation tends to be low and mostly used in the preparation
of the experiment. The various trade-offs associated with this evaluation scenario are
illustrated in Figure 4.
Rule-based scenario. The evaluation data now concerns several subjects who must
coordinate themselves to accomplish a set of tasks. The relevant events occur over several minutes and hours, instead of minutes or less. The system details being considered
have large granularity (e.g., exchanged messages instead of keystrokes). The system
scope also increases to include more functions. The evaluation methods employed in
this scenario may still adopt laboratory settings although using less instrumentation.
This scenario also represents trading off realism in favor of precision and generalizability. As with the role-based scenario, the evaluators must plan the subjects’ activities in
advance; however, the subjects should be given more autonomy since control concerns
the coordination level and not individual actions. The time invested in this type of
evaluation is higher than in the previous case, since the data gathering takes more
time and the data analysis is less straightforward (e.g., requiring debriefing by the
participants). The trade-offs associated with this evaluation scenario are illustrated in
Figure 5.
Knowledge-based scenario. The evaluation is focused mostly on the organizational
impact, and thus concerns much longer timeframes, usually on the order of days,
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:12
P. Antunes et al.
Fig. 4. Role-based evaluation.
Fig. 5. Rule-based scenario.
months, and even years, since the technology assimilation and the perception of value
to the organization may take a long time to emerge and stabilize. The evaluation
scenario is also considerably different when compared to the other scenarios, involving for instance, knowledge management, creativity, and decision-making abilities.
Considering these main goals, it is understandable that the system detail has coarse
granularity, favoring broad issues such as perceived utility or value to business. The
system scope may be wider for exactly the same reason. The evaluators may not specify the roles and activities beforehand, since the subjects have significant latitude for
decision, which leads to open situations beyond the control of the evaluators. Considering the focus on knowledge, the trade-off is usually to reduce the precision and
generalizability in favor of realism. All these differences imply the laboratory setting is
not the most appropriate for the knowledge-based scenario, and point more in favor of
more qualitative settings. Two examples of such evaluation methods employed in this
scenario are case studies and ethnographic studies. These techniques need significant
time to gather data in the field, and also time to transcribe, code, and analyze the
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:13
Fig. 6. Knowledge-based scenario.
obtained data. The trade-offs associated to this evaluation scenario are illustrated in
Figure 6.
4.4. Discussion
Analysis of the scenarios described above highlights interesting issues to ponder when
taking into consideration a collaborative system evaluation. Regarding the ensemble
of variables, the rule-based scenario seems to be the most balanced in the adopted
trade-offs. On the contrary, the role-based and knowledge-based scenarios show a
clear tendency for the extremes. The role-based scenario emphasizes detail, precision,
generalization, and time at the cost of scope and realism. On the opposite side, the
knowledge-based scenario shows a clear emphasis in scope and realism at the expense
of detail, precision, generalization, and time. These differences highlight the so-called
instrumentalist and intersubjectivist strategies, which have been quite influential in
the CSCW field [Pidd 1996; Neale et al. 2004; Guy 2005]. The instrumentalist strategy
is mostly focused on accumulating knowledge through experimentation, whereas the
inter-subjectivist strategy is concerned with interpreting the influences of the technology on individuals, groups, and the organization.
Analysis of some individual variables may also give additional insights about the
collaborative system evaluation. One such variable is invested time, which is distinct for
the three discussed scenarios. From a very pragmatic perspective, the selection of the
evaluation scenarios could be based on the time one is willing to invest on the evaluation
process. Such considerations would lead to a preference for the rule-based and rolebased scenarios and a devaluation of the knowledge-based scenarios. Nevertheless,
this approach may not be feasible due to lack of system detail, for example, whenever
evaluating design ideas. This approach also has some negative implications, such as
emphasizing details of little importance to the organization.
System detail and scope are also related to the strategy adopted in developing the system. For instance, a breadth-first strategy indicates a strong initial focus on broad functionality, which would mandate an evaluation starting with a
knowledge-based scenario that later on continues with role-based and rule-based
scenarios. On the contrary, a depth-first strategy indicates a strong preference for
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:14
P. Antunes et al.
Fig. 7. Evaluation lifecycle.
fully developing a small functional set, which would mandate an evaluation starting
with a role-based scenario that then proceeds with rule-based and knowledge-based
scenarios.
Figure 7 provides an overview of these evaluation issues. The two dotted lines show
the limits suggested by the three evaluation scenarios. The arrows show the possible
directions of the evaluation strategy and their basic assumptions.
The arrows in Figure 7 indicate possible evaluation processes adopted according to
various biases. The arrow’s starting point indicates which type of evaluation should be
done first, while the end point suggests where to finish the evaluation process.
This graphical representation also affords equating the collaborative system evaluation on other dimensions. For instance, the specific control and situation characteristics
of one particular system may determine the effort involved in evaluation. Consider a
database under evaluation that only supports mechanical control. Then, we may reckon
the low dotted line shown on Figure 7 corresponds to the most adequate evaluation.
An instrumentalist strategy should be adopted, assessing for instance the database
usability. In the case of a workflow system, where control is mixed between the system
and the users, we may consider the evaluation should be extended beyond the instrumentalist strategy, for example, contemplating the conformity of the system with
organizational procedures and rules.
5. EVALUATION GUIDELINES
This section presents a set of guidelines for the techniques and instruments used in
collaborative systems evaluation. Figure 8 shows the evaluation methods which were
presented in Section 3.2, organized by considering the role, rule, and knowledge-based
categories.
The knowledge-based evaluation emphasizes variables pertaining more to the organization and group than to the individual performance. Examples of metrics which can
be delivered by these methods include interaction, participation, satisfaction, consensus, usefulness, and cost reductions.
On the contrary, the role-based evaluation stresses the importance of the individual
performance. Metrics that can be obtained using these methods are efficiency and
usability.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:15
Fig. 8. Classification of evaluation methods.
The rule-based evaluation may be seen as being in the middle of the extremes. Some
metrics may include the organizational goals, for example, conformance to regulations,
while others may concern group performance, such as productivity.
Some methods may belong to one or two categories depending on the nature of the
instruments involved in each method, for example, cooperation scenarios (COS) are
located in the area between rule-and knowledge-based evaluation methods, because
it has elements belonging to both categories. This classification allows evaluators to
choose an appropriate method for their particular evaluation scenario.
We have also developed guidelines to select an evaluation method that depends on the
development status of the product being assessed. We consider which of the following
stages the product is in: conception (during analysis and design), implementation (during coding and software refinement), production (the product is already being used),
reengineering (the product is being structurally redesigned), or procurement (the product is going to be acquired by the organization). Figure 9 presents a summary of these
guidelines.
The rationale behind these recommendations is closely related to evaluation activities embedded in a typical software process. Validating the proposal of a collaborative system is mandatory during product conception or implementation phases. This
validation typically involves a knowledge-based method intended to assess product
usefulness for the organization. Further evaluation is usually justified if the results
from this initial assessment are satisfactory, but the product requires some improvements. Following the same line of reasoning, rule-based methods should be applied
before role-based. If we want to evaluate already-implemented products (i.e., products in production, reengineering, or procurement stage), the most suitable evaluation
method will depend on what triggered the evaluation process, for example, refinement,
redesign, or acquisition of a product. All guidelines and the rationale behind each are
described in the following.
If the product to be evaluated is in the conception stage, then the evaluation should
be oriented towards obtaining coarse-grain information to help understand the role
of the tool within the organization, the users’ expectations and needs, the business
case, and the work context. This information, usually obtained from knowledge-based
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:16
P. Antunes et al.
Fig. 9. Summary of guidelines for selection of evaluation methods.
evaluation methods, may be very useful to specify or refine the user and software
requirements, to establish the system scope, to identify product/business risks, and
to validate a product design. Performing this assessment, evaluators should adopt an
inter-subjectivist view over the collected data, considering qualitative and interactive
ways to obtain data, and using various activities such as field studies, focus groups,
and meetings. Rule-based and role-based methods do not provide a clear benefit in this
stage because they require, at least, having a prototype of the system.
If the product is in the implementation stage, a knowledge-based evaluation method
is recommended, because it will serve to understand if the product can address organizational goals. This evaluation also provides coarse-grain information concerning the
issues/components requiring improvements. This type of evaluation is optional if the
product was already evaluated with a knowledge-based method during the conception
phase. However, the implemented product could differ from its design; therefore, assuming the implemented system is still aligned with the organization needs and users’
expectations could be a mistake.
When the available time and budget allow additional evaluation actions, the process
may be complemented with a rule-based evaluation method, which would provide information necessary to adjust the product to the actual working scenario. For example,
adjustments to concrete business processes may be identified this way. Optionally, a
role-based evaluation method could also be used to fine-tune the product to the users.
In case of rule/role-based assessments, the evaluation setting may be configured to
assess the users’ activities in a controlled or mixed environment, which may utilize
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:17
laboratory settings. The evaluators may also adopt a more experimental view of the
collected data.
An evaluation may also serve to determine the current impact a system in production
has on its business operations. Therefore, the first recommended evaluation action
considers diagnosing the current situation using precise information obtained from
the actual production system. A role-based evaluation method may then be used to
gather such information.
As with the previous case, if the available time and budget allow additional evaluation actions, rule-based and knowledge-based methods could subsequently be applied.
The aim could be identifying concrete performance issues and improving organizational behavior. Rule-based methods will provide performance diagnosis information
and knowledge-based methods will contribute to identifying the impact of the legacy
system at an organizational level.
Many organizations often decide to reengineer a legacy system. The main purpose is
to change the organizational behavior by extending the system support. The existing
system may be used to guide this reengineering. In such a case it is recommended
to start with a rule-based evaluation to avoid anchoring the evaluation on too finegrained or too coarse-grained information. This type of evaluation helps identify particular improvement areas, which should be addressed in the reengineering process.
Nevertheless, a subsequent knowledge-based evaluation may determine the impact of
the reengineered product on the organizational strategy. If the reengineering process
involves significant changes to the systems’ functions, user interfaces or interaction
paradigms, a role-based evaluation may also be recommended. It allows focusing the
evaluation on particular components and also getting fine-grain and accurate information to perform the reengineering.
Often an evaluation action occurs when procuring a product. In such cases, the evaluation should start with a knowledge-based method, in order to understand whether the
system functionality matches the organizational needs. Eventually, if the evaluators
must also assess the system support of the organizational context and specific business
processes, then the recommendation is to perform a rule-based evaluation, which will
identify strengths and weaknesses of the product as support of particular activities in
the organization.
Besides these generic recommendations, the evaluators should also ponder the specific characteristics of the product under evaluation, namely the control and situation
dimensions, which impact the evaluation scenario. The knowledge-based evaluation
is naturally most adequate to products giving latitude of decision to the users and
supporting interaction, collaboration, and decision-making.
The evaluators should also ponder risk analysis. The risk adverse evaluator will set
up a complete evaluation process by considering a combination of the three evaluation
types, starting with knowledge-based and finishing with role-based scenarios. The risk
taker evaluator will probably concentrate the evaluation only on the knowledge-based
issues. The payoff of this high-risk approach is streamlining the evaluation efforts
while focusing upon the issues that may have highest impact on the organization. The
associated risk is the potential lack of quality of the outcomes.
6. THE COLLABORATIVE SYSTEM INCREMENTAL EVALUATION PROCESS
This section describes two case studies of collaborative systems evaluation. The first
one involves the evaluation of a requirements inspection tool for a governmental agency
[Antunes et al. 2006; Ferreira et al. 2009]. The second one shows the evaluation processes of a mobile-shared workspace supporting construction inspection activities for
a private construction company [Ochoa et al. 2008].
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:18
P. Antunes et al.
6.1. Evaluation of a Collaborative Software-Requirements Inspection Tool
Software-requirements inspection is a well-known software engineering task. It engages a group of reviewers in the process of evaluating how well a software product
under development accomplishes a set of previously established requirements. In a
very simplified view, the tool under evaluation requests a group of software reviewers to synchronously complete a matrix with their perceived correlations between
software requirements and specifications (from totally irrelevant to highly relevant).
This matrix allows the reviewers to identify areas where software development has
been underachieving and also to define priorities for further developing technical
specifications.
This tool has been subject to two formal evaluation procedures, the first one being a
knowledge-based evaluation and the second, a role-based evaluation. The next sections
briefly describe the new procedures, then we present a discussion about the overall
evaluation process.
6.1.1. Knowledge-Based Evaluation. From a goal-oriented perspective, the major goal
is to obtain a matrix of correlations expressing the reviewers’ perspectives, expectations, and worries about the software under development. The selection of correlations
is necessarily a qualitative task in which the reviewers must agree upon the most
appropriate link between what is being implemented and how the implementation corresponds to the reviewers’ expectations. This task is naturally complex because there
are several reviewers involved who may have different perspectives about the software
application, interpretations of what is involved in application development, hidden
agendas, etc. The tool supports the negotiation and reconciliation of these conflicting
views.
Taking these problems into consideration, the initial evaluation step was focused on
assessing the value brought by the tool to the evaluators, not only on assessing the
software development but also on resolving their conflicting views in a productive and
satisfactory way. This initial evaluation step therefore focused on knowledge-based
issues. The adopted evaluation method was based on Cooperation Scenarios (COS)
[Stiemerling and Cremers 1998] using scenario-based workshops to elicit design flaws.
The evaluation procedure was set up as follows. The tool was evaluated in two pilot
experiments involving two reviewers each. All of the reviewers were knowledgeable in
software development, project management, requirements negotiation with outsourcing organizations, and software analysis and design.
The pilot experiments were accomplished in the reviewers’ workplace, which was
a governmental agency responsible for the national pension system. The participants’
task was to assess a project concerning the introduction of a new formula for computing
pensions in the future. The specific goal set for the pilot experiments was to construct
a matrix correlating a list of user requirements with a list of technical requirements,
so that priorities could be set early in the project. The lists of user and technical
requirements were specified at the beginning of the pilot experiments with help and
approval from one of the most experienced participants. The evaluation itself was thus
focused on negotiating and completing the correlations. The matrix under evaluation
had 8 × 24 = 192 potential correlations to evaluate.
Each pilot experiment started with a brief tutorial about the tool, which took approximately 15 minutes. Then, a pair of reviewers used the tool until a consensus was
obtained. During the experiment, whenever necessary, additional help about the tool
was provided to the reviewers.
Afterwards, we asked the reviewers to complete a questionnaire with open questions about the tool’s most positive and negative aspects, as well as closed questions
concerning the tool’s functionality and usability. The results are displayed in Table 2.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:19
Table II. Results from the Questionnaire
Scores
3
4
1 (<)
2
—
—
—
—
—
—
—
2
—
4
2
3
—
—
1
—
—
—
—
—
—
2
2
2
1
1
1
1
1
1
Functionality
Convenience (available functions and their appropriateness)
Accuracy (reflecting the users opinions)
Agreement (with the inspection method)
Usability
Comprehension (understanding the tool)
Learning (how to use the tool)
Operability (effort controlling the inspection)
5 (>)
Regarding functional issues, the obtained results indicate that the tool was convenient
to use and accurate concerning the evaluators’ view of the project. We also obtained
positive indications about the consensus mechanism built into the tool, the reviewers’ understanding of the overall positions from others, ease of finding agreements,
and simplicity revising their own opinions. Additionally, the reviewers agreed that the
outcomes reflected their own opinions.
Concerning usability issues, the obtained results indicated that the participants
could understand the working logic behind the tool and that they easily learned to
deal with its functionality, as well as with the negotiation process. However, it was
pointed out that the tool was difficult to use by inexperienced users. Another drawback
was related to bad performance, since the tool spent too much time synchronizing
data. Several other minor functional and user interface details were also raised by the
participants, for example, the absence of graphical information and the difficulties in
obtaining a summary view of the negotiation.
Another interesting outcome from this evaluation was evidence that the tool provided
learning opportunities. Participants obtained new insights about negotiating software
requirements. These two pilot experiments thus gave very rich indications about the
value of this tool to the organization and to the group, as well as potential areas for
improving the tool. The adopted evaluation approach also proved adequate to elicit
knowledge-based design flaws and come up with design recommendations.
6.1.2. Role-Based Evaluation. The second formal evaluation procedure was aimed at
evaluating in detail the user interaction with the tool. It was therefore a role-based
evaluation.
The user interaction with the tool was centered on the notion of shared workspace.
Shared workspaces are becoming ubiquitous, allowing users to share information and
to organize activities in very flexible and dynamic ways, usually relying on a simple
graphical metaphor. This evaluation procedure thus aimed at optimizing the shared
workspace use, assuming such optimization would increase the evaluators’ already
positive opinion about the tool.
The adopted evaluation approach was to analytically devise different options for
shared workspace use and predict their performance. The method applied well-known
human information processing models to measure the shared workspace performance
and to draw conclusions about several design options. The adopted model was the
Keystroke-Level Model (KLM) [Card et al. 1980]. KLM is relatively simple to use and
has been successfully applied to evaluate single-user applications, although it had to be
adapted to the collaborative systems context for this evaluation [Ferreira et al. 2009].
Based on KLM, each user interaction may be converted into a sequence of mental and
motor operators, whose individual execution times have been empirically established
and validated by psychological experiments. This way we could find out which sequence
of operators would minimize the execution time of a particular shared workspace
implementation.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:20
P. Antunes et al.
Table III. Results of KLM Evaluation of Shared Workspace
Design conditions
a) 3 users
a.1) no scroll
(75% probability)
a.2) scroll (25%
probability)
b) 6 users
b.1) no scroll
(75% probability)
b.2) scroll (25%
probability)
Design A
5 s.
MMPKKPKK
8.6 s.
MPKPKMMPKKPKK
Design B
9.8 s.
5 s.
MMPKKPKK
11.3 s.
MPKKMPKKPKKMMPKKPKK
9.8 s.
5 s.
MMPKKPKK
11.3 s.
MPKKMPKKPKKMMPKKPKK
Note: Operators: M—Mental; P—Pointer; K—Key.
We modeled three low-level functions associated with the shared workspace usage:
locating correlations, selecting correlations, and negotiating correlation values. Several
alternative designs for these functionalities were analytically evaluated. The adopted
approach offered a common criterion, based on execution time, to compare the various
implementations and find out which implementation would offer the best performance.
In Table 3 we show the obtained results, highlighting that Design A has better overall
performance than Design B.
6.1.3. Discussion. Overall, these evaluations allowed us to obtain several insights
about the tool. The initial experiments were mostly focused on broader organizational
and group issues, such as positive/negative effects, convenience, and respect for the participants’ opinions. Although the obtained results were characterized by low precision
and generalizability, they were very insightful for further development and contributed
to perceptions of the value attributed to the tool by the organization. The final experiments addressed fine-grained details about the tool usage and allowed us to experiment
with alternative functionality and, ultimately, adopt the functionality that would offer
the best performance. These latter results were characterized by high precision and
generalizability, although they had low realism.
In both cases the time invested in the evaluation was low, due to different causes.
In the first case it was low because we adopted a pilot study approach; in the latter
case, it was low because we adopted an analytic approach. The system detail was quite
different between the two evaluations. In the first case it was very low (positive/negative
aspects), while in the second case it was very high (keystrokes). Conversely, in the first
case the system scope was high (whole application) and in the second case was very
low (few functions).
6.2. Evaluation of an Application to Support Construction Inspection Activities
Construction projects typically involve a main contractor, which in turn outsources several parts of the whole project, such as electrical facilities, gas/water/communication
networks, painting, and architecture. The companies in charge of these activities usually work concurrently and they need to be coordinated because the work they are
doing is highly interrelated. In fact, the project progress rate and the product quality
increase when all these actors appropriately coordinate among themselves.
The main contractor is usually a manager responsible for the coordination process.
The inspection activities play a key role in this process. The goal of these activities
is to diagnose the status of the construction project elements and to determine the
need to approve, reject, or modify the built elements based on the diagnosis. Each
inspection is carried out by one or more inspectors using paper-based blueprints. These
inspectors work alone (doing independent tasks) or form an inspection team (when their
examinations are interrelated). The inspection process requires that these inspectors
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:21
be on the move recording the contingency issues (problems identified by one inspector)
related to particular components of the project.
Periodically, the main contractor informs the subcontracted companies about the list
of contingency issues they have to address. The process required to deal with these
issues may involve the work of more than one subcontractor, and of course, at least one
additional inspection.
In order to support the inspection activities and help coordinate the problem-solving
process, a mobile-shared workspace named COIN (COnstruction INspector), was developed. This collaborative system manages construction projects composed of sets of
digital blueprints, which are able to store annotations done with a stylus on a Tablet
PC. The system also supports mobile collaboration among the users, and data sharing
(file transfer and data synchronization) between two mobile computing devices.
Two types of evaluations were applied to this tool: knowledge- and rule-based evaluations. The following two sections describe the evaluation processes; a third section
presents a discussion of the obtained results.
6.2.1. Knowledge-Based Evaluation. During the first stage of the project, a ScenarioBased Evaluation (SBE) strategy [Haynes et al. 2004] was used to identify the scenarios and requirements involved in the construction inspection process. Two formal
evaluations were done using this strategy; the first one during the software conception phase and the second one during the design phase. Each one involved two steps:
(1) individual interviews to construction inspectors and (2) a focus group to validate
the interview results.
Three experienced construction inspectors participated in the evaluation during the
conception phase. Each interview was about one hour long. The participants had to
characterize the work scenarios to be supported, and also specify and prioritize the
functionalities which are required to carry out the inspection process. The results
showed consensus on the types and features of the scenarios to be supported by the
tool. However, there was no consensus on the functionalities the tool should provide
to the inspectors. After the interviews, the results were written and given back to the
inspectors.
A week after that, a focus group was performed in order to try to get an agreed set
of functionalities to implement in the software tool. The focus group was about three
hours long and most of the participants changed their perception about which functionalities were most relevant to support the collaborative inspection process. A consensus
was obtained after that session. The most important functionalities related with collaboration were the following: (1) transparent communication among inspectors, (2) selective visualization of digital annotations, (3) annotation filtering by several criteria, (4)
unattended and on-demand annotations synchronization (between two inspectors), and
(5) awareness of users’ availability and location.
During the COIN design process, a preliminary prototype was used to validate the
development team’s proposals to deal with the requirements identified in the previous
phase. Once again, a SBE strategy was used. Before the individual interviews, the
inspectors received a training session lasting about 30 minutes. After that, each one
explored the prototype features for about 45 minutes. Finally, a one-hour interview
was done with each inspector. The main goal of the interview was to identify positive,
negative, and missing issues on the tool, and determine if the functionalities included
in the prototype were enough to support a collaborative inspection process. The results
showed a long list of specific and detailed comments with some kind of matching among
the inspectors’ opinions. Similar to the previous evaluation process, these issues were
written and given back to the participants.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:22
P. Antunes et al.
Table IV. Results of the Inspection Process
Experience
With COIN
Without COIN
Labels Found
37
38
Inspection-Elapsed
Time
23 minutes
35 minutes
Annotations Review-Elapsed
Time
6 minutes
9 minutes
Total Elapsed
Time
29 minutes
44 minutes
After a week we did the focus group session, where the COIN prototype was reviewed
again, as were the inspectors’ comments. The session’s main goal was to categorize the
inspectors’ comments into the following three categories: (1) critical (it must be included in the tool), (2) recommended (it is a good idea to include it), and (3) and optional
(it could be included if there is enough time). The focus group took about three and a half
hours, and identified 12 critical, 17 recommended, and 8 optional issues. The developers
were in the session (as observers) to get the requirements directly from the source.
The effort of carrying out the second evaluation was at least double the first, however, the result was highly accurate, detailed, and valuable, which allowed us to adjust
the proposed components in order to deal with the inspectors’ comments. The development team members recognized these comments were key to improving the matching
between the prototype functionality and the inspectors’ needs. However, it is important to acknowledge the opinions of three inspectors are not enough to determine the
inspection requirements of a construction company. A larger number of participants
implies not only more general and validated results, but also a larger evaluation effort.
6.2.2. Rule-Based Evaluation. Once the first version of COIN was delivered, an empirical evaluation experience was conducted at the Computer Science Department of the
University of Chile using the tool. A variant of the Cooperation Scenarios (COS) evaluation method [Stiemerling and Cremers 1998] was used in this case. The experience
involved an area of 2000 m2 approximately, deployed on two floors. These areas mainly
included offices, meeting rooms, laboratories, and public spaces. 40 labels simulating
contingency issues were adhered to the physical infrastructure and electrical facilities.
Two civil engineers, who participated in the previous evaluation process, conducted
the reviewing process. They first used COIN running on a Tablet PC to carry out
the inspection, and then repeated the process using physical blueprints. In both cases
an observer followed the activities of each inspector in order to verify the coherence
between the inspectors’ opinions and the empirical observation. In addition, these
observers recorded the time involved in particular tasks of the inspection process.
The engineers agreed beforehand on a common strategy to conduct both inspection
processes. The strategy consisted of performing two tasks in sequence to gather the contingency issues, and to determine the coherence between the inspectors’ annotations.
During the first evaluation round, the inspectors identified the contingency issues
and created the corresponding annotations using COIN. Subsequently, they met to
review each annotation, and they decided the reviews were consistent. Afterwards,
the labels simulating contingency issues were changed and relocated, to reproduce the
experimental conditions of the first experience. The inspection process was repeated,
but now using blueprints.
Finally, the engineers were interviewed to assess their feeling about the use of the tool
to support inspection processes. The observers provided information about the duration
of several activities involved in the inspection process, such as the contingency gathering and the integration of annotations. The idea was to establish a sound parameter
against which to compare the whole process of inspection with/without COIN usage.
Table 4 shows the results of the inspections using the tool and blueprints, respectively. These results indicate an improvement of the elapsed times when COIN is used.
During the interview with the inspectors, both indicated they preferred to use the collaborative application for numerous reasons. (1) Digital maps are easier to use than
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:23
Table V. Results of Coordination Activities
Experience
With COIN
Without COIN
Time for
Retrieving
Blueprints
< 2 minutes
Go to the main
contractor’s office
Time to
Integrate
Annotations
< 1 minutes
1 hour (∗ )
Time for
Reporting
Annotations
< 2 minutes
Go to the main
contractor’s office
Tasks
Creation Elapsed Time
35 minutes
40 minutes (∗ )
Contingencies
Report Creation Time
< 2 minutes
1–2 hours (∗ )
Note: ∗ Estimations done by the inspectors.
paper-based blueprints. (2) Writing annotations on the screen of a Tablet PC is more
comfortable than writing them on a blueprint placed on a wall. (3) User mobility improves when COIN is used. (4) Reviewing annotations is faster when using the tool
because both Tablet PCs can be put together, thus the distance between annotations
being compared is small (which eases the process).
Although the use of COIN shows positive results, they do not represent a great
improvement to the current inspection process. The most important advantages of
using COIN are related with the coordination process. Table 5 shows several interesting
improvements in terms of coordination activities. For example, digital blueprints can
be retrieved from the main contractor’s server through a Web service, which is accessed
via Ethernet or a cellular network, when COIN is used. This operation took less than
two minutes and avoided the trip to the main contractor’s office, which was required in
the paper-based case.
Moreover, the process of integrating the inspectors’ annotations took less than a
minute when the collaborative system was utilized. By contrast, the integration could
have taken about one hour for paper-based inspection. The time it took to report the
annotations to the main contractor is also considerably reduced with the system use.
The time spent creating the tasks related with the annotations is similar in both
cases. However, the creation of the contingencies report is considerably reduced when
COIN is used.
This evaluation process gives us useful preliminary information to understand the
possible impact of the tool in the construction inspection scenario. However, a large
number of observations are required to get a more accurate diagnosis about the impact
of the tool on a real construction company.
6.2.3. Discussion. In the first two evaluations (i.e., when SBE was used) just three
inspectors were involved because of the effort required to carry out these evaluations.
The evaluation effort (mainly time) in SBE grows considerably with each additional
participant. Clearly this is a method which provides a high degree of realism when it is
applied to a large number of participants. However, it also requires long invested time.
The reward for that work is an agreed set of specific and detailed (positive, negative, and
missing) issues, which must be considered during the development of a collaborative
supporting tool. Counting on these issues is highly important in determining how well
the product under development matches the users’ needs.
The second evaluation process (i.e., when COS was used) provides an interesting
strategy to obtain a diagnosis of the tool’s usefulness, and its impact on the process
in which it is utilized. The feedback is detailed and precise; however it requires a
large number of participants to generalize the results. This implies an increase in the
evaluation effort. Such effort could be reduced using agents running in the background
and recording the time involved in the several tasks. Thus, the observer would no
longer be required.
Finally, the main limitation of the COS evaluation method could be the level of
realism of the obtained results. If the testing scenario (laboratory) is similar to the
real scenario, then the results will be representative. Otherwise, the evaluation effort
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:24
P. Antunes et al.
could be meaningless. If COS is going to be used for an evaluation process, then it is
important to consider the cost of having a testing scenario similar to the real one.
7. CONCLUSIONS AND FUTURE WORK
The second section of this article starts with a tough question: why is collaborative
systems evaluation so difficult? As we have thoroughly discussed, there is no single
culprit. Indeed, the difficulties are practical (e.g., dealing with many subjects and
groups), theoretical (addressing different cognitive levels, specifying satisfying criteria)
and methodological (e.g., dependence of the evaluation on the development process).
The various evaluation methods reviewed in this paper and the timeline showing
their emergence corroborate the complexities. Many of these methods are not competing
for the same goal, but instead they complement the whole framework necessary to
evaluate collaborative systems.
The task, then, of the evaluator is to define the necessary trade-offs and select a set
of satisfactory evaluation methods. This article tries to ease this task.
To accomplish this goal, we started by identifying the set of variables which may
be necessary to build a comprehensive evaluation framework. Such a framework must
deliver a balanced albeit concise combination of variables addressing the practical,
theoretical, and methodological issues that make collaborative systems evaluation so
difficult. We defined six variables: generalization, precision, realism, system detail,
system scope, and invested time.
The generalization, precision, and realism variables fundamentally concern theoretical issues regarding how satisfying the evaluation results may be to the evaluator.
The system detail and system scope concern methodological issues associated with the
product development strategy. The invested time variable concerns a very practical
issue of assessing the amount of time available to the evaluator in order to conduct the
evaluation.
Yet these six variables still constitute quite a complex evaluation framework. We
must ease the evaluator’s decision-making task. Thus, we have also considered three
performance levels: role-based, rule-based, and knowledge-based performance. These
levels of performance lay out the relative importance attributed to each one of the six
variables previously described. For instance, the role-based level assigns high importance to the generalization, precision, and system detail variables, and low importance
to realism, invested time, and system scope.
Overall, the performance levels define three distinct evaluation scenarios aiming
to reduce the number of choices considered by the evaluator without significantly
compromising the comprehensiveness of the evaluation process. Given the evaluation
scenarios, we then discussed which evaluation lifecycle, that is, combination of scenarios, could be adopted by the evaluator. The discussion is essentially based on two
criteria: bias for invested time and product development criteria. Considering the bias
for invested time, the issue is to recommend the evaluation lifecycle and corresponding
scenarios that are cost-effective with respect to the time spent doing the evaluation. On
the other hand, the product development criterion is concerned with aligning the evaluation with the development cycle, which may have adopted a depth-first or breadth-first
approach.
Thus this approach leads the evaluator towards a fairly straightforward decisionmaking process that considers the product being developed, the development lifecycle,
and the time available to evaluate the product.
Finally we also relate the existing evaluation methods to the evaluation scenarios
just mentioned, thus easing the definition of the concrete evaluation plan. The article
also describes two case studies illustrating the use of the evaluation framework and
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:25
showing how the three evaluation scenarios complement each other towards assessing
prototypes at various levels of granularity.
The main contributions of this article are twofold. The most important one offers
decision-making support to evaluators wishing to disentangle the inherent complexity
of collaborative systems evaluation. The proposed approach covers the whole endeavor
ranging from the selection of evaluation variables, definition of satisfying criteria, and
adoption of an evaluation lifecycle. The second contribution lays out a foundation for
classifying evaluation methods. The evaluation methods seem to emerge in a very adhoc
way and cover quite distinct goals regarding why, how, what, and when to evaluate. This
situation makes it difficult to classify them in a comprehensive way. We have proposed
a classification highlighting their major distinctions. We hope this classification will be
helpful to future research and practice in the CSCW area.
APPENDIX
Appendix A. Timeline of Evaluation Methods
1978
1980
1987
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2008
[Yourdon 1978]
[Card et al. 1980]
[Suchman 1987]
[Nielsen 1989]
[Nielsen and Molich 1990]
[Wharton et al. 1994]
[Tang 1991]
[Bias 1991]
[Polson et al. 1992]
[Rowley and Rhoades 1992]
[Urquijo et al. 1993]
[Twidale et al. 1994]
[Nielsen 1994]
[Nielsen 1994]
[Ereback and Höök 1994]
[Bias 1994]
[Hughes et al. 1994a]
[Hughes et al. 1994b]
[Plowman et al. 1995]
[Gutwin et al. 1996]
[Van Der Veer et al. 1996]
[Baeza-Yates and Pino 1997]
[Stiemerling and Cremers 1998]
[Ruhleder and Jordan 1998]
[Briggs et al. 1998]
[Neale and Carroll 1999]
[Gutwin and Greenberg 1999]
[Gutwin and Greenberg 2000]
[Carroll 2000]
[Van Der Veer 2000]
[Steves et al. 2001]
[Baker et al. 2001]
[Sonnenwald et al. 2001]
[Baker et al. 2002]
[Cockton and Woolrych 2002]
[Pinelle and Gutwin 2002]
[Pinelle et al. 2003]
[Antunes and Costa 2003]
[Haynes et al. 2004]
[Convertino et al. 2004]
[Humphries et al. 2004]
[Inkpen et al. 2004]
[Kieras and Santoro 2004]
[Briggs et al. 2004]
[Vizcaı́no et al. 2005]
[Baeza-Yates and Pino 2006]
[Antunes et al. 2006]
[Pinelle and Gutwin 2008]
Structured walkthroughs
Keystroke-Level Model (KLM)
Ethnomethodological studies
Discount usability engineering
Heuristic evaluation
Cognitive walkthroughs
Observational studies
Interface walkthroughs
Cognitive walkthroughs
Cognitive jogthrough
Breakdown analysis
Situated evaluation
Usability inspection
Heuristic evaluation
Cognitive walkthrough
Pluralistic usability walkthrough
Quick-and-dirty ethnography
Evaluative ethnography
Workplace studies
Usability studies
Groupware task analysis
Formal evaluation of collaborative work
Cooperation scenarios
Video-based interaction analysis
Technology Transition Model
Multi-faceted evaluation for complex, distributed activities
Evaluation of workspace awareness
Mechanics of collaboration
Scenario-based design
Task-based groupware design
Usage evaluation
Heuristic evaluation based on the mechanics of collaboration
Innovation diffusion theory
Groupware heuristic evaluation
Discount methods
Groupware walkthrough
Collaboration usability analysis
Perceived value
Scenario-based evaluation
Activity awareness
Laboratory simulation methods
Evaluating collaboration in co-located environments
Computational GOMS
Satisfaction Attainment Theory
Knowledge management approach
Performance analysis
Human performance models
Tabletop collaboration usability analysis
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:26
P. Antunes et al.
REFERENCES
ANTUNES, P. AND COSTA, C. 2003. Perceived value: A low-cost approach to evaluate meetingware. In Proceedings
of CRIWG’03. Lecture Notes in Computer Science, vol. 2806, 109–125.
ANTUNES, P., FERREIRA, A., AND PINO, J. 2006. Analyzing shared workspace design with human-performance
models. In Proceedings of CRIWG’06. Lecture Notes in Computer Science, vol. 4154, 62–77.
ANTUNES, P., RAMIRES, J., AND RESPÍCIO, A. 2006. Addressing the conflicting dimension of groupware: A case
study in software requirements validation. Comput. Informatics 25, 523–546.
ARAUJO, R., SANTORO, F., AND BORGES, M. 2002. The CSCW lab for groupware evaluation. In Proceedings of
CRIWG’02. Lecture Notes in Computer Science, vol. 2440, 222–231.
BAECKER, R. M., GRUDIN, J., BUXTON, W., AND GREENBERG, S., EDS. 1995. Human-computer Interaction: Toward
the Year 2000. Morgan Kaufmann, San Francisco, CA, 1995.
BAEZA-YATES, R. AND PINO, J. 1997. A first step to formally evaluate collaborative work. In Proceedings of the
ACM International Conference on Supporting GroupWork (GROUP ‘97). 55–60.
BAEZA-YATES, R. AND PINO, J. 2006. Towards formal evaluation of collaborative work and its application to
information retrieval. Info. Res. 11, 4.
BAKER, K., GREENBERG, S., AND GUTWIN, C. 2001. Heuristic evaluation of groupware based on the mechanics
of collaboration. In Proceedings of the 8th IFIP International Conference on Engineering For HumanComputer interaction. Lecture Notes in Computer Science, vol. 2254, 123–140.
BAKER, K., GREENBERG, S., AND GUTWIN, C. 2002. Empirical development of a heuristic evaluation methodology for shared workspace groupware. In Proceedings of the ACM Conference on Computer Supported
Cooperative Work. 96–105.
BIAS, R. 1991. Interface-walkthroughs: Efficient collaborative testing. IEEE Softw. 8, 5, 94–95.
BIAS, R. 1994. The pluralistic usability walkthrough: coordinated empathies. Usability Inspection Methods.
J. Nielsen and R. Mack, Eds., John Wiley & Sons, New York, 63–76.
BRIGGS, R., ADIKNS, M., MITTLEMAN, D., KRUSE, J., MILLER, S., AND NUNAMAKER, J. 1998. A technology transition
model derived from field investigation of GSS use aboard the U.S.S. CORONADO. J. Manage. Info. Syst.
15, 3, 151–195.
BRIGGS, R., QURESHI, S., AND REINIG, B. 2004. Satisfaction attainment theory as a model for value creation. In
Proceedings of the 37th Annual Hawaii International Conference on Systems Sciences, IEEE Computer
Society Press.
CARD, S., MORAN, T., AND NEWELL, A. 1980. The keystroke-level model for user performance time with interactive systems. Comm. ACM 23, 7, 396–410.
CARROLL, J. 2000. Making use: Scenario-Based Design of Human-Computer Interactions. The MIT Press,
Cambridge, MA.
COCKTON, G. AND WOOLRYCH, A. 2002. Sale must end: should discount methods be cleared off HCI’s shelves?
Interactions 9, 5, 13–18.
CONEVRTINO, G., NEALE, D., HOBBY, L., CARROLL, J., AND ROSSON, M. 2004. A laboratory method for studying
activity awareness. In Proceedings of the 3rd Nordic Conference on Human-Computer Interaction. 313–
322.
DAMIANOS, L., HIRSCHMAN, L., KOZIEROK, R., KUTRZ, J., GREENBERG, A., WALLS, K., LASKOWSKI, S., AND SCHOLTZ, J.
1999. Evaluation for collaborative systems. ACM Comput. Surv. 31, 2, 15.
DESANCTIS, G., SNYDER, J., AND POOLE, M. 1994. The meaning of the interface: a functional and holistic evaluation of a meeting software system. Decis. Supp. Syst. 11, 319–335.
EREBACK, A. AND HÖÖK, K. 1994. Using cognitive walkthrough for evaluating a CSCW application. In Proceedings of the Conference Companion on Human Factors in Computing Systems, 91–92.
FERREIRA, A., ANTUNES, P., AND PINO, J. 2009. Evaluating shared workspace performance using human information processing models. Info. Res. 14, 1, 388.
FJERMESTAD, J. AND HILTZ, S. 1999. An assessment of group support systems experimental research: methodology and results. J. Manag. Info. Syst. 15, 3, 7–149.
GREENBERG, S. AND BUXTON, B. 2008. Usability evaluation considered harmful (some of the time). In
Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems,
111–120.
GUTWIN, C. AND GREENBERG, S. 1999. The effects of workspace awareness support on the usability of real-time
distributed groupware. ACM Trans. Comput.-Human Interact. 6, 3, 243–281.
GUTWIN, C. AND GREENBERG, S. 2000. The mechanics of collaboration: Developing low cost usability evaluation methods for shared workspaces. In Proceedings of the IEEE International Workshops on Enabling
Technologies Infrastructures for Collaborative Enterprises, 98–103.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
Structuring Dimensions for Collaborative Systems Evaluation
8:27
GUTWIN, C., ROSEMAN, M., AND GREENBERG, S. 1996. A usability study of awareness widgets in a shared
workspace groupware system. In Proceedings of the ACM Conference on Computer Supported Cooperative
Work, 258–267.
GUY, E. 2005. “. . .real, concrete facts about what works. . .”: integrating evaluation and design through
patterns. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work,
99–108.
HAYNES, S., PURAO, S., AND SKATTEBO, A. 2004. Situating evaluation in scenarios of use. In Proceedings of the
ACM Conference on Computer Supported Cooperative Work, 92–101.
HERSKOVIC, V., PINO, J. A., OCHOA, S. F., AND ANTUNES, P. 2007. Evaluation methods for groupware systems. In
Proceedings of CRIWG, Lecture Notes in Computer Science, vol. 4715, 328–336.
HIX, D. AND HARTSON, H. R. 1993. Developing User Interfaces: Ensuring Usability Through Product and
Process. John Wiley & Sons, Inc., New York, NY.
HUANG, J. P. H. 2005. A conceptual framework for understanding collaborative systems evaluation. In Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise. 215–220.
HUGHES, J., KING, V., RODDEN, T., AND ANDERSEN, H. 1994a. Moving out from the control room: ethnography
in system design. In Proceedings of the ACM Conference on Computer Supported Cooperative Work.
429–439.
HUGHES, J., SHARROCK, W., RODDEN, T., O’BRIEN, J., ROUNCEFIELD, M., AND CALVEY, D. 1994b. Field Studies and
CSCW. Lancaster University, Lancaster, U.K.
HUMPHRIES, W., NEALE, D., MCCRICKARD, D., AND CARROLL, J. 2004. Laboratory simulation methods for studying
complex collaborative tasks. In Proceedings of the 48th Annual Meeting Human Factors and Ergonomics
Society, 2451–2455.
INKPEN, K., MANDRYK, R., DIMICCO, J., AND SCOTT, S. 2004. Methodology for evaluating collaboration behaviour
in co-located environments. In Proceedings of the ACM Conference on Computer Supported Cooperative
Work.
KIERAS, D. AND SANTORO, T. 2004. Computational GOMS modeling of a complex team task: lessons learned. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 97–104.
MCGRATH, J. 1984. Groups: Interaction and Performance. Prentice-Hall, Englewood Cliffs, NJ.
NEALE, D. AND CARROLL, J. 1999. Multi-faceted evaluation for complex, distributed activities. In Proceedings
of the Conference on Computer Support For Collaborative Learning. 425–433.
NEALE, D., CARROLL, J., AND ROSSON, M. 2004. Evaluating computer-supported cooperative work: models and
frameworks. In Proceedings of the ACM Conference on Computer Supported Cooperative Work, 112–121.
NEWELL, A. 1990. Unified Theories of Cognition. Harvard University Press, Cambridge, MA.
NIELSEN, J. 1989. Usability engineering at a discount. In Designing and Using Human-Computer Interfaces
and Knowledge Based Systems, G. Salvendy and M. Smith, Eds., Elsevier Science Publishers, Amsterdam, 394–401.
NIELSEN, J. 1994. Usability inspection methods. In Proceedings of the Conference on Human Factors in
Computing Systems, 413–414.
NIELSEN, J. AND MOLICH, R. 1990. Heuristic evaluation of user interfaces. In Proceedings of the ACM SIGCHI
Conference on Human Factors in Computing Systems, 249–256.
OCHOA, S., PINO, J., BRAVO, G., DUJOVNE, N., AND NEYEM, A. 2008. Mobile shared workspaces to support construction inspection activities. In Collaborative Decision Making: Perspectives and Challenges, P. Zarate,
J. Belaud, G. Camileri, and F. Ravat, Eds., IOS Press, Amsterdam, 211–220.
PIDD, M. 1996. Tools for Thinking. J. Wiley & Sons, Chichester.
PINELLE, D. AND GUTWIN, C. 2000. A review of groupware evaluations. In Proceedings of the 9th IEEE WETICE
Infrastructure for Collaborative Enterprises, 86–91.
PINELLE, D. AND GUTWIN, C. 2002. Groupware walkthrough: adding context to groupware usability evaluation.
In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 455–462.
PINELLE, D. AND GUTWIN, C. 2008. Evaluating teamwork support in tabletop groupware applications using
collaboration usability analysis. Pers. Ubiq. Comput. 12, 3, 237–254.
PINELLE, D., GUTWIN, C., AND GREENBERG, S. 2003. Task analysis for groupware usability evaluation: modeling
shared-workspace tasks with the mechanics of collaboration. ACM Trans. Comput.-Human Interact. 10,
4, 281–311.
PINSONNEAULT, A. AND KRAEMER, K. 1989. The impact of technological support on groups: an assessment of the
empirical research. Decis. Supp. Syst. 5, 3, 197–216.
PLOWMAN, L., ROGERS, Y., AND RAMAGE, M. 1995. What are workplace studies for? In Proceedings of the 4th
European Conference on Computer-Supported Cooperative Work, 309–324.
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.
8:28
P. Antunes et al.
POLSON, P. G., LEWIS, C., RIEMAN, J., AND WHARTON, C. 1992. Cognitive walkthroughs: a method for theorybased evaluation of user interfaces. Int. J. Man-Mach. Stud. 36, 5, 741–773.
RASMUSSEN, J. AND JENSEN, A. 1974. Mental procedures in real-life tasks : a case-study of electronic trouble
shooting. Ergonomics 17, 293–307.
REASON, J. 2008. The Human Contribution: Unsafe Acts, Accidents and Heroic Recoveries. Ashgate, Surrey,
UK.
ROSS, S., RAMAGE, M., AND ROGERS, Y. 1995. PETRA: participatory evaluation through redesign and analysis.
Interact. Comput. 7, 4, 335–360.
ROWLEY, D. AND RHOADES, D. 1992. The cognitive jogthrough: a fast-paced user interface evaluation procedure.
In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 389–395.
RUHLEDER, K. AND JORDAN, B. 1998. Video-based interaction analysis (VBIA) in distributed settings: a tool for
analyzing multiple-site, technology-supported interactions. In Proceedings of the Participatory Design
Conference, 195–196.
SCRIVEN, M. 1967. The methodology of evaluation. In Perspectives of Curriculum Evaluation, R. Tyler,
R. Gagne, and M. Scriven, Eds., Rand McNally, Chicago, 39–83.
SONNENWALD, D., MAGLAUGHLIN, K., AND WHITTON, M. 2001. Using innovation diffusion theory to guide collaboration technology evaluation: work in progress. In Proceedings of the IEEE International Workshop on
Enabling Technologies, 114–119.
STEVES, M., MORSE, E., GUTWIN, C., AND GREENBERG, S. 2001. A comparison of usage evaluation and inspection methods for assessing groupware usability. In Proceedings of the International ACM SIGGROUP
Conference on Supporting Group Work, 125–134.
STIEMERLING, O. AND CREMERS, A. 1998. The use of cooperation scenarios in the design and evaluation of a
CSCW system. IEEE Trans. Softw. Eng. 24, 12, 1171–1181.
SUCHMAN, L. 1987. Plans and Situated Actions: The Problem of Human-Machine Communication. Cambridge
University Press, Cambridge, U.K.
TANG, J. 1991. Findings from observational studies of collaborative work. Intern. J. Man-Machine Stud. 34,
2, 143–160.
TWIDALE, M., RANDALL, D., AND BENTLEY, R. 1994. Situated evaluation for cooperative systems. In Proceedings
of the ACM Conference on Computer Supported Cooperative Work, 441–452.
URQUIJO, S., SCRIVENER, S., AND PALMEN, H. 1993. The use of breakdown analysis in synchronous CSCW
system design. In Proceedings of the 3rd European Conference on Computer Supported Cooperative
Work—ECSCW 93, 289–302.
VAN DER VEER, G. 2000. Task based groupware design: putting theory into practice. In Proceedings of the
Symposium on Designing Interactive Systems. 326–337.
VAN DER VEER, G., LENTING, B., AND BERGEVOET, B. 1996. Gta: Groupware task analysis—modeling complexity.
Acta Psychologica 91, 3, 297–322.
VELD, M. A. A. H. I. T., ANDRIESSEN, J. H. E., AND VERBURG, R. M. 2003. E-MAGINE: The development of an
evaluation method to assess groupware applications. In Proceedings of the 12th International Workshop
on Enabling Technologies: Infrastructure for Collaborative Enterprises, 153–158.
VIZCAÍNO, A., MARTINEZ, M., ARANDA, G., AND PIATTINI, M. 2005. Evaluating collaborative applications from a
knowledge management approach. In Proceedings of the 14th IEEE International Workshops on Enabling
Technologies: Infrastructure for Collaborative Enterprise (WETICE’05), 221–225.
WHARTON, C., RIEMAN, J., LEWIS, C., AND POLSON, P. 1994. The cognitive walkthrough method: a practictioner’s
guide. In Usability Inspection Methods, J. Nielsen and R. Mack, Eds., John Wiley & Sons, New York,
105–140.
YOURDON, E. 1978. Structured Walkthroughs. Yourdon Inc, New York.
Received February 2010; revised June 2010; accepted August 2010
ACM Computing Surveys, Vol. 44, No. 2, Article 8, Publication date: February 2012.