How Shall We Assess This?
Janet Carter
John English
Computing Laboratory
University of Kent
Canterbury, CT2 7NF, UK
[email protected]
School of Comp and Math Sciences
University of Brighton,
Brighton, BN2 4GJ, UK
[email protected]
Kirsti Ala-Mutka
Martin Dick
William Fone
Inst Software Systems
Tampere University of Technology
Finland
[email protected]
School of CS and SE
Monash University
Victoria 3145, Australia
[email protected]
School of Computing
Staffordshire University
Stafford, ST18 0DG, UK
[email protected]
Ursula Fuller
Judy Sheard
Computing Laboratory
University of Kent
Canterbury, CT2 7NF, UK
[email protected]
School of CS and SE
Monash University
Victoria 3145, Australia
[email protected]
ABSTRACT
1
Increased class sizes are forcing academics to reconsider
approaches to setting and marking assessments for their students.
Distributed and distance learning are creating some of the biggest
changes. Some educators are embracing new technologies but
others are more wary of what they do not know. In order to
address this issue it is first necessary to investigate the types of
assessment currently in use and the perceptions that are held by
academics with and without experience of the new technologies
that are becoming available.
With increasing class sizes in educational establishments
worldwide, the practice of assessment is becoming a problematic
issue; increased numbers make it more difficult to assess student
attainment. If assessments are graded manually, educators must
either set fewer assessment tasks or resign themselves to a greatly
increased marking load. In order to cope with increasing student
numbers automated assessment is becoming increasingly
important in many courses. The number of papers related to the
topic that have been presented at ITiCSE conferences in recent
years [e.g. 21, 30, 31, 41, 49, 65] reflects this increasing interest.
Automated assessment can save time and human resources but its
adoption must be pedagogically sound. It is a widely held belief
that on-line teaching and learning will be the savior of the
educational system. Current research suggests that students
initially prefer to be taught by a human, finding a machine too
impersonal and a disincentive to learning; but that once the initial
stages are completed a machine is an acceptable teacher [54].
There are, however, some important issues to consider: How do
you tell that the person taking an on-line examination is the
person they should be? How do you tell that they aren’t receiving
help? There are projects investigating the effective use of such
techniques [3, 60, 74], but they are still in their infancy.
In this paper we present the findings of an international survey of
Computer Science academics teaching a variety of topics within
the discipline. The findings are split into two sections: a snapshot
of current assessment practices and an analysis of respondents’
perceptions of Computer Aided Assessment (CAA). Academics’
opinions about the advantages and disadvantages of CAA are split
in line with level of experience of using such techniques. Those
with no experience of CAA suggest that it cannot be used to test
higher-order learning outcomes and that the quality of the
immediate feedback is poor; these negative opinions diminish as
experience is gained.
Categories and Subject Descriptors
1.1
K.3.2 [Computers and Education]: Computer and Information
Science Education – computer science education
INTRODUCTION
Assessment Principles
Human Factors.
All assessments should follow sound educational principles and
the most widely adopted epistemology within the CS arena
appears to be that of constructivism. Constructivist principles of
educational development suggest that:
Keywords
•
General Terms
Assessment, Computer Aided Assessment, Plagiarism.
107
Students are active participants in the process of their own
learning
•
All learning takes place within a context – usually the
classroom – where shared meanings and understandings can
be created
•
Students require time to reflect upon the work that they are
doing
•
Students require the space to be allowed to make mistakes
and to learn from these mistakes
2
In order to obtain the perceptions and experiences of academics a
web-based survey was created (see Appendix). The working
group participants then advertised the survey as widely as possible
to CS academics.
The responses are not necessarily representative of all CS
academics; they are necessarily skewed by the means of
advertising and the nationalities of the authors. Within the UK the
survey was advertised via the LTSN-ICS (Learning and Teaching
Support Network for Information Computer Sciences) mailing list
and the UK CAA mailing list. US responses were solicited via
the SIGCSE mailing list.
Ben-Ari [5] notes that learning should be active, not passive;
students are being called upon to build mental models of abstract
conceptions of how computers work, the nature of variables in
programming, and so on. Computer Science in particular is a
deeply practical subject, and providing as much opportunity for
practical work as possible will help to develop students’
understandings of the principles behind the subject, as long as this
work is undertaken in concert with human assistance to overcome
misconceptions and refine mental models. The downside, for
educators teaching large numbers of students, is that each piece of
practical work needs to be marked.
1.2
In Finland the questionnaire was advertised via the Virtual
University Network, on the IT-PEDA University Network and on
the Finnish Society for Computer Science newsgroup. Australian
responses were gathered from members of five of the six schools
of the Faculty of IT at Monash University along with members of
their Computing Education Research group, paper presenters from
the ACE (Australasian Computing Education) 2003 conference,
and users of the Faculty of IT intranet at Queensland University of
Technology in Brisbane. It was also advertised multi-nationally
on the KIT e-learning mailing list.
What is CAA?
In this paper, Computer Aided Assessment (CAA) is defined as
any activity in which computers are involved in the assessment
process as more than just an information storage or delivery
medium. Computers can be used in various ways to support
students’ learning and course processes. They can provide
numerical marking and feedback in both textual and visual
formats. In distance and web-based education CAA is often a
natural extension to the course. It can also offer many
possibilities for campus-based education, especially when large
classes are involved. It also provides an easy mechanism for
statistical analysis of assessment results at a later date.
Statistical analysis, including Mann-Whitney U-tests and KruskalWallis tests, of the numerical aspects of the data has been
performed, where appropriate, at a 5% level (p < 0.05).
2.1
The Respondents
The responses to the questionnaire are not analyzed by gender or
geographical area. Table 1 does, however, provide a breakdown
of responses by country of origin, gender and topic taught; this
merely provides a context for the conclusions that are drawn.
Unsurprisingly the overwhelming majority of responses originate
from the countries represented by the authors. 35% of replies are
from females, and 25% are from those who teach programming.
Computer Science as a subject area is well positioned to benefit
from automated assessment. Those who teach the subject can
often use their expertise to develop systems that help to reduce
their workload without compromising student learning.
1.3
A SNAPSHOT OF CURRENT
ASSESSMENT PRACTICES
Cheating and Plagiarism Issues
2.2
The issue of cheating and plagiarism is an increasing problem for
academics. A recent UK survey suggests that the incidence is
actually much higher than many academics realize [20]. Whilst
the interviewing of all students to ensure that they can reproduce
the work they submit [38] may be practical when students produce
only a few pieces of work, it does not scale to situations where
students produce regular weekly solutions.
Assessment Regimes
A wide variety of assessment techniques are used by CS
academics; many using more than one technique per course.
Table 2 shows the overall proportion of respondents using
different types of assessment; the sum is greater than 100% as it is
often the case that assessments of different types are set within the
same course. An unusual result emerged: breaking the figures
down by topic taught reveals that five respondents assess some
aspect of their programming course by means of an essay.
On-line plagiarism detectors such as JPLAG [50] and MOSS [61]
already rely upon electronically submitted work, and it is logical
to consider automated marking of such submissions. Another
potential benefit of automated assessment techniques is that they
can assist in avoiding plagiarism by presenting students with
randomly chosen problem sets or individualized exercises.
Although students can collaborate on solving the problems they
have been set, they are no longer able to submit verbatim copies
of solutions obtained by others.
Table 3 shows a breakdown of submission mechanisms and
marking techniques for each type of assessment. It is notable that
practical work is the only kind of assessment task for which a
majority of academics use electronic submission.
108
Table 1: Nationality, gender and topics taught
Country
Topic
Australia
Finland
M
F
M
Programming
1
3
6
Mathematics
1
UK
F
F
M
F
M
8
2
6
3
2
1
1
1
1
2
1
1
1
Databases
3
1
2
1
1
WWW
2
Hardware/computer architecture
1
Operating systems
2
2
1
HCI
Networks
3
1
1
1
3
3
1
2
2
1
1
3
1
3
3
2
1
Distributed systems
1
1
1
1
1
1
1
1
Compilers
1
1
1
Ubiquitous computing
Software Engineering
2
Data Mining
1
11
8
19
M
F
Total
23
8
31
2
0
2
1
2
3
7
3
10
3
7
10
6
1
7
5
3
8
2
7
9
4
4
8
5
0
5
3
3
6
5
0
5
1
0
1
1
1
2
2
1
3
1
0
1
2
3
5
1
0
1
1
1
0
1
1
3
0
3
1
79
0
43
1
122
19
5
24
31
19
50
14
8
22
4
3
7
could complement the weaknesses of others. The levels in the
first three of these are presented in Table 4.
Taxonomy of CS Assessment
In order to understand what is going on in assessment, it is
necessary to know:
•
1
1
IT
•
1
4
Ethics
•
•
1
1
1
1
Design
2.3
1
3
1
Real-time systems
Totals
F
1
Data structures/algorithms
Systems analysis
Totals
Other
M
Computing theory
Information systems
USA
Bloom’s taxonomy is by far the most widely used of those
described here; in fact some universities and colleges require
course designers to check that they are assessing at every one of
Bloom’s levels. The lowest level, knowledge, is also known as
recall or remembering and is concerned with the reproduction of
previously learned materials. Comprehension is also referred to
as understanding and assesses the ability to explain and
summarize materials. Application is concerned with using learned
material in new situations. Together, knowledge, comprehension
and application are considered to be the lower levels of the
cognitive domain and the ones most frequently tested using
multiple-choice questions.
The subject matter being assessed
The level of study of the learner (first year, honors year,
masters etc)
The instrument being used (essay, closed test, practical work
and so on)
The cognitive attainment levels that the students are expected
to achieve
Our survey, therefore, asked respondents about cognitive
attainment, ideally in a way that was simultaneously short,
comprehensible and comprehensive. This is not easy.
Analysis, synthesis and evaluation form the upper levels of the
cognitive domain. Bloom placed synthesis and evaluation on a
par and there has been much subsequent debate about their
ordering. Anderson and Krathwohl’s revision of Bloom’s
taxonomy places synthesis at the highest level [2] and it is
generally held that it, uniquely, cannot be assessed through
multiple-choice and short answer questions [36].
A number of taxonomies of competence in the cognitive domain
have been produced over the past fifty years. By far the best
known is that of Bloom et al [10]; a committee that examined a
large bank of assessment instruments produced it in the USA in
the 1950s. More recently, the Structure of the Observed Learning
Outcomes (SOLO) taxonomy and the reflective thinking
measurement model have been proposed and a revised, twodimensional version of Bloom has been developed. Chan et al
[15] report that these taxonomies are closely related and that each
109
Table 2: Use of different assessment types
Other
written
exercise
55%
Essay
Proportion
25%
Practical
work
74%
Openbook
exam
16%
Closedbook
exam
57%
In-class
test
Presentation
Other
31%
34%
17%
Table 3: Marking and submission techniques by assessment type
21
13
26
Other written
exercise
46
32
47
Practical
work
46
54
55
Closed-book
exam
63
10
58
Open-book
exam
10
8
11
In-class
test
28
11
27
4
14
26
10
5
1
3
2
5
10
4
11
16
16
5
1
0
4
1
1
Essay
Submission
mechanism
Marking
techniques
used
Manual
Electronic
Manual
Pt manual, pt
electronic
Electronic
Peer assess
Interview
Presentation
Other
28
22
24
6
12
5
6
6
4
6
1
1
5
12
10
6
2
5
Table 4: Taken from Chan et al [15]
Bloom
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
SOLO
Prestructural
Unistructural
Multistructural
Relational
Extended Abstract
Reflective Thinking
Habitual Action
Thoughtful action
Reflection of content
Reflection of process
Critical reflection
Table 5: Cognitive level by assessment style
(%)
Remembering
Understanding
Application
Problem
solving
Evaluation
No. assess.
types used
Mean levels per
type
24
73
95
Closedbook
exam
81
91
57
Openbook
exam
40
85
60
Inclass
test
56
92
53
21
96
71
Other
written
exercise
20
80
64
54
67
84
60
55
56
32
47
63
79
43
47
37
45
36
68
53
48
24
61
85
68
20
36
34
15
343
13.4
4.5
3.8
4.8
14.3
8.1
7.5
18.2
0.9
Essay
Practical
work
Respondents indicated the cognitive levels they assess with each
type of assessment used. Table 2 shows the incidence of each
assessment type across all courses, whilst Table 5 breaks these
down by cognitive level. A large number of assessments test
‘problem solving’, i.e. a combination of analysis and synthesis, so
we use this term rather than analysis and synthesis separately.
Presentation
Other
Total
15
74
68
33
73
67
38
82
70
There are few surprises in Table 5, for example it shows heavy
use of closed book examinations to test remembering and
concentration on the assessment of evaluation through essays and
presentations. In a CS context, the heavy emphasis on problem
solving in closed-book examinations and in-class tests may be a
plagiarism avoidance technique.
scale ranging from not a problem, through minor problem and
moderate problem to major problem. Respondents could also
indicate that they didn’t know. Table 6 details the results of the
survey. The mode and quartiles are also shown to indicate the
problem level of plagiarism for each kind of assessment task. As
can be seen, essays, other written exercises and practical work are
the assessment types that provoke the highest levels of concern
from respondents with regard to levels of plagiarism. This is
worthy of further investigation; the survey data has been analyzed
to determine which, if any, assessment factors influence the
respondents perceptions of plagiarism as a problem (Assessment
task ‘other’ was not specified in the survey, so no further analysis
was performed).
2.4
•
Plagiarism Issues
Respondents were asked to indicate the level of problem that they
experience with plagiarism for each type of assessment task listed
in the survey. The problem level was measured on a four-point
110
Exercise Type. The four types of exercise (fixed set of
questions, choice of questions, individually tailored questions
and group work) were examined using the Mann-Whitney U
test for each kind of assessment task to determine whether or
not there were any statistically significant differences in the
level of plagiarism between those respondents who used the
exercise type and those respondents who did not use the
exercise type. The only statistically significant difference
was found for those respondents setting practical work using
a fixed set of questions. Examination of the median and
mode however revealed no difference between the groups
(median and mode were minor problem for both groups).
This indicates that the difference is probably not practically
significant. Overall, the type of exercise has little impact on
the perceived level of plagiarism.
•
perceived level of plagiarism amongst students. This may be
due to the increased ease of copying an assessment task that
is in an electronic format. Regardless of the reason for the
increase, the actual level of increase does not raise large
concerns.
Overall, the submission mechanism has no major effect upon
the perceived problem level of plagiarism amongst the
respondents.
•
Aspects of Learning Covered. Aspects of learning were
examined in two different ways, firstly by comparing each
type of assessment task with the respondent’s perception of a
problem with plagiarism against whether the assessment was
attempting to cover one of the aspects of learning
(remembering, understanding, application, problem-solving,
evaluation). A Mann-Whitney U test found that only one of
the forty options was statistically significant when a closedbook examination was assessing understanding. In this case
the mode and median for both groups were found to be
identical; this indicates that, on a practical level, this aspect
of learning/assessment type has no effect upon perceived
levels of plagiarism.
Secondly, the data was recoded to indicate the highest aspect
of learning for each kind of assessment. This was then
examined with a Kruskal-Wallis test to compare the
perceived level of plagiarism for the different levels of the
aspect of learning. No statistically significant differences
were found for any of the kinds of assessment.
The second method by which marking technique was
examined involved recoding the data for manual, part
manual/part electronic and fully electronic for each
assessment type into one variable recording the extent of
electronic marking. A Kruskal-Wallis test showed no
statistically significant differences between the three levels of
electronic marking for any of the types of assessment except
for closed-book examination, when looking at the perceived
problem level with plagiarism. For this type of assessment,
there was a small increase in median for part manual/part
electronic when compared to wholly manual or electronic.
Overall, the aspects of learning covered have no real effect
upon the perceived level of plagiarism.
•
Marking Techniques Used. Respondents reported the use
of a variety of marking techniques: manual, part manual/part
electronic, fully electronic, peer assessed, interview. These
were then compared with the perceived level of plagiarism
problem for each type of assessment in two different ways.
Firstly, each type of assessment was compared using a
Mann-Whitney U test for users and non-users of the marking
technique.
This found that only one marking
technique/assessment type combination had statistically
significant differences; this was the use of part manual/part
electronic marking with a closed-book examination. In this
case, the median increased from not a problem to minor
problem. The modal responses for those who used part
manual/part electronic and those who didn’t use it were the
same however (not a problem). This indicates that while
such a combination may cause some slight problems with
plagiarism, they are not particularly worrying.
Submission Mechanism. The impact of the mechanism of
submission (manual or electronic) for each kind of
assessment was examined using a Mann-Whitney U test. For
most types of assessment, there were no statistically
significant differences on the level of plagiarism between
manual and electronic submission.
The median for part manual/part electronic was mid way
between not a problem and minor problem as compared to
not a problem for the other two groups. The modal response
for all three groups was ‘not a problem’, which combined
with the small size of the increase in median indicates that
the extent of electronic marking has little effect on plagiarism
in closed-book examinations. Overall, it can be concluded
that the marking techniques used for an assessment task will
have little if any effect on the perceived level of plagiarism.
For closed book examinations and in-class tests, the use of
electronic submission did result in a small increase in the
median for perceived problem level of plagiarism, but did not
affect the mode. Table 7 shows the differences in median.
This indicates that for these two kinds of assessment task,
electronic submission can cause a small increase in the
Table 6: Plagiarism problems for kind of assessment task
Plagiarism level for …
Essay
N
Mode
25th percentile
50th percentile
75th percentile
37
Minor
Not
Minor
Moderate
Other
written
exercise
66
Minor
Minor
Minor
Moderate
Practical
work
Closedbook exam
Openbook exam
In-class
test
Presentation
88
Minor
Not
Minor
Moderate
70
Not
Not
Not
Not
25
Not
Not
Not
Minor
41
Not
Not
Not
Minor
37
Not
Not
Not
Minor
111
Table 7: Impact of Electronic Submission on Plagiarism
Closed-Book Exam
Doesn’t Use
Uses
•
2.5
Median
Not a problem
Minor problem
Mode
Not a problem
Not a problem
students with different sets of questions or by parameterizing
questions with different values. (One of the authors has used this
approach for in-class tests, where each student is given the same
questions but with different values in each question. On one
occasion a student submitted of set of completely incorrect
answers, but further investigation showed that the answers would
have been correct if he had been given the same set of questions
as the student next to him, and because of the demonstrable
plagiarism he failed not only the test but the entire course.)
Teaching Experience and Course Level Taught. Neither
of these factors had any statistically significant differences
using a Kruskal-Wallis test for any type of assessment task
when examining the perceived level of problem with
plagiarism.
Perceptions of CAA
The respondents’ perceptions of CAA are shown in Table 8. This
shows that there is strong agreement with the proposition that
CAA reduces marking time and improves the immediacy of
feedback to students, and some agreement that CAA offers greater
objectivity in marking and that it allows students to work at their
own pace and more flexibly. There is also some disagreement
with the propositions that CAA has fewer security risks than
manual assessment, that it is more time-consuming than manual
assessment, that it improves the quality of feedback to students,
and that it disadvantages special-needs students.
Most
respondents were broadly neutral on the issues of the possibility
of testing higher-order learning using CAA and that the use of
CAA makes students more anxious.
Some interesting differences emerge when comparing the
attitudes of those who have not used CAA with those who have,
either a little or a lot. Table 9 shows the proportions of
respondents having experience of using Learning Environments
and CAA.
The respondents as a whole were broadly neutral on the question
of using CAA to test higher-order learning, but when the data is
correlated against the respondents’ experience of CAA the picture
is quite different.
Respondents with more experience of CAA tend to believe more
strongly that it could be used to test higher-order learning, but
those without experience are more skeptical. This would appear
to reflect the widespread myth that CAA can only be used for
multiple-choice tests; many of those who have little experience
with CAA have used commercial learning environments such as
Blackboard [9] and WebCT [77] which only support multiplechoice tests, and this reinforces the acceptance of the myth. Figure
10 shows a more detailed breakdown of the responses to this
question.
The questionnaire also included space for comments on the
perceived advantages and disadvantages of CAA. Analysis of
these comments shows that the primary perceived advantages of
CAA are:
•
•
•
•
In-Class Test
Doesn’t Use
Uses
Midway between not
Not a problem
and minor
Not a problem
Not a problem
The time saved in marking, particularly with large groups
The immediacy of feedback, although not necessarily the
quality of feedback (however, one respondent noted that
‘poor feedback from CAA is preferable to detailed manual
feedback which arrives weeks after the work was
completed’)
The objectivity and consistency of marking
That CAA frees students to work at their own pace
A Kruskal-Wallis test shows significant statistical differences in
the views of respondents on objectivity, flexibility and the
immediacy and quality of feedback, with those having more
experience with CAA consistently believing more strongly that
CAA had significant benefits in each of these areas. Table 11
shows the median and mode for each category of respondent in
each of these areas.
These comments broadly agree with the numerical findings
presented in Table 8. Some respondents also noted that the speed
of marking allows more frequent assessments, both formative and
summative, to be conducted; again, this is particularly true for
large groups of students. Several respondents commented that
having student submissions available in electronic form allows
post-hoc analysis of submissions to be performed, although this is
of course true of any form of electronic submission system
whether the submissions are marked electronically or not.
•
A number of other drawbacks were widely reported. Many
respondents were concerned about reliance upon technology, both
in terms of downtime and of system failures that might lose or
corrupt data. Issues of security and plagiarism also caused
widespread concern, particularly in distance learning
environments where it is difficult, or impossible, to verify the
identity of the person undertaking the assessment. This is to some
extent offset by the recognition that some CAA systems allow
questions to be individually tailored, either by presenting different
112
Usage of CAA. The level of problem with plagiarism
respondents had found with the different kinds of assessment
task was compared with the usage of CAA to determine if
there was any relationship between plagiarism problems and
CAA. Usage of CAA was measured on two scales, firstly
whether they had used CAA at all or not. A Mann-Whitney
U test found that there were no statistically significant
differences in the two groups for any of the assessment tasks.
The second scale measured usage of CAA on three levels
(not at all, used a little, used a lot). A Kruskal-Wallis test
found no statistically significant differences in the three
groups. This indicates that the usage of CAA has no
noticeable effect on the perceptions of respondents about the
level of plagiarism for assessment tasks.
•
•
•
Security of CAA. Kruskal-Wallis tests found that there was
no statistically significant difference in perceptions of the
security of CAA amongst respondents for any of the
following factors:
•
•
Level of teaching experience of the respondent
Usage of CAA at all
Usage of CAA at varying levels (Not at all, Used a
little, Used a lot)
Extent of electronic marking
Course level taught
Table 8: Opinions regarding aspects of CAA
CAA has fewer security risks than manual
assessment
CAA is more time-consuming than manual
assessment
CAA reduces marking time
It is possible to test higher-order learning using CAA
CAA offers greater objectivity in marking
CAA allows students to work at their own pace and
more flexibly
The use of CAA makes students more anxious
CAA improves the immediacy of feedback to
students
CAA improves the quality of feedback to students
CAA disadvantages special-needs students
Strongly
disagree
Disagree
Neutral
Agree
Strongly
agree
Total
5%
32%
40%
15%
7%
114
10%
35%
28%
21%
5%
113
2%
4%
3%
4%
27%
19%
12%
33%
25%
50%
33%
45%
32%
3%
9%
114
114
113
1%
10%
26%
45%
18%
114
2%
25%
54%
18%
2%
112
0%
4%
5%
64%
26%
114
5%
4%
29%
28%
42%
52%
19%
13%
5%
2%
111
113
Table 9: Actual CAA experience
Do you have any experience (past or present) of using online learning
environments?
Have you ever used computer-aided assessment?
Is computer-aided assessment used within your department?
No
Yes
A little
A lot
Total
36%
64%
116
38%
26%
41%
59%
60
50
Proportion (%)
•
40
No
30
Yes, a little
Yes, a lot
20
10
0
Strongly
disagree
Disagree
Neutral
Agree
Figure 10: Higher order learning
113
Strongly
agree
21%
15%
117
117
Table 11: Medians and modes for perceptions with statistically significant group differences
3
Used CAA?
N
Can test higherorder learning
CAA offers
greater
objectivity
More flexible for
students
Immediacy of
feedback
No
43
disagree/disagree
disagree/neutral
disagree/disagree
neutral/neutral
Yes,a little
Yes, a lot
48
23
disagree/neutral
neutral/neutral
neutral/neutral
neutral/neutral
neutral/neutral
agree/agree
neutral/neutral
agree/agree
Andrew Solomon from the University of Technology in Sydney
[70] is typical of many academics in having over 250 first year
students to teach each year. One of the things he is expected to
teach them is basic competence with Unix. In previous years the
students were provided with a book and sample questions in
preparation for the manually marked examination at the end of the
course. Marking these examinations consumed a disproportionate
portion of the staff time devoted to the subject.
3.1
Pros and Cons of CAA
Using CAA can provide an increased amount, quality and variety
of feedback for students as well as providing new assessment
tools for educators. The benefits include speed, consistency, and
availability 24 hours per day. Such tools can support a variety of
different learning habits and needs amongst the students. Wellwritten CAA tools can allow teachers more time for advanced
support, assessment and education design tasks rather than basic
support.
Digitally formatted assessments can also offer
possibilities for reuse and resource sharing between teachers.
This year Andrew wrote a program to semi-randomly generate
Unix oriented tasks for the students and then determine whether or
not the students had accomplished them. This means that the
students receive their mark immediately on completion of the
examination. It also means that staff time, which was previously
devoted to marking, is now spent on teaching.
The problems Andrew experienced with this include:
•
strong dis-dis
/strongly dis
disagree/disagree
neutral/neutral
program, which is then compiled and run against several sets of
test data. All code is initially marked electronically, and any
answers which fail to compile or which fail to process any of the
sets of test data successfully are flagged for manual inspection. In
many cases, manual inspection reveals that the answers submitted
are ‘nearly right’, with simple errors such as missing semicolons
causing the compilation to fail. Marks are awarded manually for
such answers. This CAA tool does not differentiate between the
students that did not obtain completely correct solutions due to its
simplicity; it cannot distinguish between ‘nearly correct’ and
‘wrong’.
COMPUTER AIDED ASSESSMENT
Many academics claim to be interested in the issues surrounding
the adoption of CAA and the experiences related here encapsulate
many aspects of their expressed concerns and experiences. In this
section we present two stories that exemplify some issues of
concern to many.
•
•
Quality of
feedback
There are, however, disadvantages with the use of CAA. The setup phase for a new system or new assessments requires more time
and resources than the development of traditional assessments.
Despite careful design work there can be errors or technical
problems that prevent the assessment system from working as
expected. This, especially on courses that use computer-based
systems heavily, can demotivate students or even hinder their
learning.
The expense of generating questions
The restricted type of questions that can be electronically
marked
The rather unforgiving marking of which a computer is
capable
This was a new scheme and Andrew did not want his students to
be disadvantaged by the new system, so he decided to evaluate
this new assessment regime. Andrew asked two experts to
interview 20 students and attempt to assess their general Unix
competence, without too close reference to the curriculum. He
compared the experts’ opinions of the students’ abilities with the
marks they attained in the examination. The experts felt that the
students all had approximately the same level of competence, but
the tests appeared to discriminate between the students; the
distinction between students that was detected by the tests is likely
to be no more than an indicator of the amount of time students
devoted to studying for the examination rather than a difference in
general understanding; this is common in non CAA examinations
also. The experiment was time consuming and stressful, but
ultimately successful.
3.2
Types of CAA in CS Education
This section presents a summary of existing CAA approaches
used within CS education, describing their connections to
Bloom’s taxonomy [10]. Respondents report that they mostly use
electronic marking to test for higher cognitive levels of learning,
i.e. problem solving and evaluation. The survey did not contain
specific questions about the types of CAA respondents used, but
the large numbers of answers from programming teachers setting
practical work suggests that a large number of respondents use at
least partially automatic marking when assessing programming
tasks.
•
John English from the University of Brighton provides another
example. He regularly uses online practical programming
examinations for first-year students [31]. They are conducted
under normal closed-book examination conditions; however,
many of the questions involve writing program fragments that are
marked electronically after the end of the examination. The
marking is performed by embedding the answers into a test
114
Multiple-choice questions. These can be used to assess
many cognitive levels. Whilst poorly constructed multiplechoice questions assess nothing but logic in answering, welldesigned questions are a good tool for assessing knowledge
and comprehension [56], or even higher cognitive levels
[28]. Multiple-choice questions are often offered by learning
environment software, such as WebCT and Blackboard [77,
9]. The questions can take several forms; visual, multiple
choice, combination, gap filling [22, 35, 51, 63]. Generating
different permutations of question sets or different values for
question variables means that tests can be individualized to
prevent cheating and memorizing, and also to increase
complexity [32, 71]. In addition, some tools have been
developed to assess the comprehension of a specific topic,
e.g. pointers in C++ [53].
•
•
•
•
complementing the automatic assessment with human assessment
[1, 46].
According to survey respondents, people use CAA more for
summative assessment than formative assessment, although it is
used for both. The literature suggests that CAA can be efficiently
incorporated into many courses to aid formative assessment and
student support throughout a course [19, 2, 71]. CAA makes it
possible for students to learn from feedback and resubmit their
work, which supports their learning and self-assessment [57], and
it facilitates the generation of individualized questions according
to students’ attainment levels or previous success [35, 75]. This
fits with the educational principles of constructivism: allowing
students to learn from their mistakes and being contingent upon
them [80].
Textual answers.
Textual assessment often requires
students to write short answers or essays. Again, dependent
upon the skill and experience of the question designer, these
can be used to test both lower and higher order learning
skills. There are several approaches to supporting automatic
assessment of free text answers.
Some learning
environments and question systems base the assessment on
either direct text comparison or regular expressions, and
these support short answer questions. There are, however,
many more sophisticated approaches to the assessment of
textual content [12, 16, 59].
3.3
What we can learn about CAA from
other disciplines
Our survey shows that most respondents are teaching and
assessing programming, so it is probably inevitable that their main
use of electronic methods of coursework submission and marking
focus on practical work and written exercises. Nevertheless, the
survey covers teaching of a wide range of topics such as
mathematics, information systems and ethics. It is therefore
appropriate to see whether we can learn anything from the use of
CAA in other disciplines.
Programming assignments. Practical programming is the
most common way of evaluating students application and
problem-solving skills. When aiming to assess lower level
application skills, CAA can be helpful in assessing a
student’s ability to produce correctly functioning classes or
functions, without the need to be able to produce a whole
program [3, 11, 65]. CAA can also be used to test the
analysis level. For example, by providing students with a
program containing bugs and requiring them to correct it into
a program that compiles and functions correctly [31]. There
are also several integrated systems and tools for assessing
complete programming assignments. The most common
metrics are program functionality, complexity, style and
efficiency [41, 74]. Some systems also provide additional
metrics, such as test data coverage [57] and programming
skill [35]. In addition to integrated tools there also exist
independent, aspect specific, tools, for example: dynamic
memory management or coding conventions in C++ [1];
spreadsheet and database testing [67].
Extensive work on the automated assessment of mathematics led
to the formation in 2001 of the Scottish Centre for Research into
On-Line Learning and Assessment (SCROLLA). This group has
been working on assessment at the school-university interface and
its’ work is described by Beevers and Paterson [8]. They provide
examples of automated assessment using simple questions, such
as
Find the equation of the normal to the curve with equation
f(x) = x3 – 2x at the point where x = 1
And more complex ones, such as
A function is defined by f(x) = (x2+6x+12)/(x+2), x ≠ -2
Visual answers. Visualizations can be used for assessing
both understanding and application, e.g. with data structures
and algorithms [52]. Translating programs into flowcharts
can be used to assess a student’s understanding of program
flow [11]. It is also possible to assess the problem-solving
level by using CAA for design diagrams [43].
(a) Express f(x) in the form ax+bx+b/(x+2) stating the values of
a and b
(b) Write down an equation for each of the two asymptotes
(c) Show that f(x) has two stationary points
Peer assessment. With large numbers of students, it would
be very difficult to arrange organized peer evaluation without
the aid of computers. With electronic submission it is
possible to automatically randomize peer assessors, to
anonymize answers, to give and store the marks and to
compare their consistency; peer assessment strategies can be
greatly developed. Some approaches to computer assisted
peer assessment have already been introduced [34, 76]. Peer
assessment can also be utilized for assessing students’
evaluation skills, by comparing their assessment to their
teacher’s assessment or to each other’s [23].
(d) Sketch the graph of f
(e) State the range of values of k such that the equation f(x) = k
has no solution
Their approach is to use software that can process user-entered
mathematical expressions and to split each problem into a series
of intermediate steps. This provides support to the weaker
students, who can obtain partial marks even if they cannot answer
the whole question immediately. Making this optional has
benefits for stronger students in accordance with Wood’s notions
of contingency [80]. They also provide a range of modes of
delivery ranging from an examination mode to a help mode that is
useful for self-assessment. Much of this work could be adopted
directly by computer scientists who are teaching their students
mathematics. The approach of using optional, intermediate steps
It can be deduced from the survey that it is more common to
combine CAA use with manual marking than to rely on it totally
for fully electronic marking; for example either flagging all cases
that are not clearly correct for human inspection [31] or
115
has wider applicability for assessment of core computer science
topics.
4
USING CAA
It is apparent that CAA is not a panacea for the problems of
assessment. The most widely available test type in current CAA
systems is the multiple-choice test, or variations on this theme
(e.g. [33]). These are most suitable for assessing lower order
skills, and this is perhaps reflected by the survey responses where
approximately 63% of first-year courses used some form of CAA,
declining to approximately 44% for intermediate level courses and
31% for final year and postgraduate courses. This may, however,
also be due to other factors such as the larger class sizes in firstyear courses compared to later years, and the ‘little and often’
assessment philosophy used in many institutions for first-year
courses.
The use of automated assessment in business and management is
also growing as teachers struggle to cope with rapidly growing
class sizes, usually in conjunction with a Virtual Learning
Environment (VLE). The area of learning tested is often
quantitative but it differs from the assessment of mathematics
described above in that the emphasis is on interpretation. Smailes
[69] describes the benefits of using multiple choice, graphic and
‘fill in the gaps’ tests in Blackboard [9] for a first year Data
Analysis course. Fuller [36] also discusses the use of CAA in
assessing quantitative methods at both first year and postgraduate
level. He presents sample questions based on every level of
Bloom’s taxonomy except synthesis, and argues that it is perfectly
possible to assess analysis and evaluation through multiple choice
questions or ones requiring a short phrase that can be
automatically parsed.
A serious limitation is the ‘black and white’ marking schemes
employed in CAA, which are only really suitable for assessments
in courses like programming where an answer can be clearly
categorized as right or wrong. Many systems for the automatic
assessment of programming exercises have been developed (e.g.
[3, 21, 47, 57, 65, 74]) and it is notable that our survey shows
approximately 70% of first-year programming courses using CAA
to some extent. Shades of gray in marking schemes have to be
aggregated from a number of ‘black and white’ scores for
different aspects of the assessment.
Modern language teachers have developed a range of tools for
supporting and assessing the acquisition of language skills,
generically referred to as CALL (Computer Assisted Language
Learning). Techniques used to assess learning of grammar and
vocabulary include sentence reordering and gap filling; these are
considered to have a high degree of validity in measuring
attainment [66].
One of the major perceived drawbacks to the use of CAA is the
cost in time and effort required to set up a CAA-based assessment.
This depends partly upon the quality of the authoring framework
used to create the assessments and their marking schemes, but it
also depends on the time and effort spent in staff training to be
able to use the authoring system and the imagination of staff in
finding ways to use CAA effectively for particular courses. There
is also a cost involved in administering the system, which may
include making adjustments to cater for special-needs students,
granting extensions to deadlines and collating results. It may be
possible to reuse questions once they have been developed, but
the rate at which the contents of courses can change, particularly
in a fast-moving field like computing, means that a process of
continuous maintenance is needed to prevent assessments from
becoming stale.
Work to apply automated assessment to extended pieces of
writing is now producing results that are claimed to be as accurate
as human markers for technical essays. Content and style are both
measured and graded. The disadvantage of this approach for long
essays is that a large dictionary of reference material is required,
which makes the approach infeasible for most people. However
for short answer questions this approach can be used to mark
against a single sample response supplied by the setter. A good
review of this work is provided by Whittington and Hunt [79].
The work described here has obvious parallels in computer
science teaching and mainly measures recall, comprehension and
application to straightforward problems. There is, however,
another very different group of applications of computer aided
assessment. These involve its use in measuring participation in
discussion and creative writing. Hatzimoysis [40] gives a very
clear picture of the use of virtual seminars in teaching philosophy
topics, including ethics. He points out that the ongoing,
asynchronous format of the virtual seminars gives formative
feedback and that electronic discussion captures the contributions
of individual students to the synthesis of a group report. In a
computer science context, this can be used to measure students’
ability to work in a group, as described by Fuller et al [37].
Virtual seminars for a final year course in Computer Support for
Cooperative Working taken by psychology and computing
students are described by the ASTER project [42], where the
evaluation showed that assessment of seminar participation
increased students’ motivation to contribute and reflect on the
subject matter. Sloman [68] also reports the benefits of virtual
seminars in promoting student reflection; he used them to teach a
course on economic principles. He concludes that ‘Virtual
seminars are best suited to what I call ‘metaeconomics’. These
involve students questioning assumptions behind policy
objectives, relating economic concepts to current economic
issues…’
Test design for automatic assessment is also more complicated
than it is for manual assessment. If a marking scheme for manual
assessment is badly designed, human markers can use their
judgment to overcome its deficiencies. With an automatic
marker, no such judgment is possible, and it therefore takes more
effort to not only design questions that can be assessed
successfully, but to implement and test the corresponding marking
schemes.
In order to maximize the possibility of reuse, question banks can
provide a solution. A locally-developed question bank can
provide the necessary variety in assessment exercises; externally
developed question banks may also help, but here there is a
question of quality control and how well externally developed
questions fit in with the needs of the courses at a particular
institution.
The use of externally developed question banks requires
adherence to standards to ensure interoperability.
The
overwhelming majority of survey respondents considered
interoperability to be an important issue, regardless of whether
they themselves use CAA at present. The IMS Question & Test
Interoperability Specification [45] is designed specifically to
116
tests and considerably more difficult to analyze free form text
answers. This generates the perception that automated testing is
more appropriate for lower order learning skills. This perception
is likely to change as lexical analysis improves or as questioning
styles evolve. Carneson et al [13] have used multiple-choice
questions to assess higher order learning skills; they demonstrate
how to assess all but synthesis.
address this particular issue, although it is an open-ended
specification that provides for vendor-specific extensions. The
obvious danger with an extensible standard is that, like
programming languages, different vendors will provide
incompatible extensions that will actually hinder interoperability.
4.1
Alleviating Disadvantage
It has been suggested that the majority of university level courses
offer a similar experience to each student taking them [24];
increasing numbers of students often implies a reduced amount of
time to be spent upon individuals. This is complicated by the
increasingly widely differentiated past experiences that students
bring to university.
An assessment system should not
disadvantage any identifiable portion of the cohort. For example
it is well documented that male and female students pursue
electronic debates in subtly different fashions [48], so designing
an assessment that rewards typical male behavior and punishes
(by lack of marks) typical female behavior [18] is to be avoided
or, if this is not possible, compensated by an assessment that is
biased in the other direction.
Automated assessment requires careful design and administration
and the development effort can be considerable.
The
developments of question banks and support networks are
demonstrating benefits and reducing the overheads of
implementing CAA. Some interoperability problems exist when
CAA products are designed to be platform specific or require
particular security protocols [78], but there is widening
acceptance of CAA. Interoperability standards are emerging [45]
that will help to promote widespread use of question banks and
the popularity of CAA.
4.2
One important issue that needs to be considered when adopting
CAA is security. Security in this discussion refers to activities by
the students that could affect the validity of the assessment
outcomes, for example, plagiarism, impersonation of the
candidate, electronic eavesdropping and system hacking.
Although students may cheat in any form of assessment, cheating
is generally only perceived as a problem with summative
assessment tasks. In formative assessment the students are seen
as only cheating themselves and this of lesser concern.
It is obviously important that assessment is impartial and
effective. The role of assessment has expanded over the last 50
years to provide evaluation of programs in addition to the
traditional placement, selection and certification of individual
learners. We must balance the need to provide robust evaluation
data with the duty to serve the individual requirements of the
learner [25].
Assessment has an important pedagogical role and must match the
desired learning paradigms and taxonomic levels; assessment and
learning are inextricably linked and the planned assessment style
will shape the teaching and learning strategies. It therefore
follows that assessment plays a pivotal role in driving the
motivation of individual learners [39]. Personal motivation is
driven by two sets of forces:
•
Those provided externally by society that may include peer
group pressures, goals set by progression requirements and
career aspirations
•
Emotional and internal expectations such as confidence, prior
experiences, personal esteem, etc
Of particular concern is the security of assessment that is
conducted online and in isolation from the educator, and the
opportunities this offers for cheating. Recent research by
Carpenter et al [14] found that 62% of students did not consider
that working in groups on take home examinations or Web-based
quizzes was cheating. This concurs with Barnett [6] who reported
that the incidence of cheating increased when the supervisor left
the room during an examination. These studies would tend to
suggest that students sitting an online examination would be more
inclined to cheat if they were unsupervised. In an attempt to
address this problem, there have been various developments that
facilitate automatic invigilation of online assessment.
Technologies such as fingerprint identification have been
developed to verify the online user [17]. Others describe
processes for the management of unsupervised examinations to
increase security. Thomas et al [74] conduct synchronized online
examinations that use password access and a set time limit for
completion.
Mulligan [62] suggests using software to
synchronize a student’s computer clock with the network to verify
the time that a test was taken, but there appears to be no clear and
reliable way to verify the identity of the person who sits an
assessment task online.
So a learner can perceive a task as:
•
•
•
•
Plagiarism and Security Issues
Necessary and enjoyable
Unnecessary but enjoyable
Necessary but not enjoyable
Unnecessary and not enjoyable [7]
If the style of the assessment matches the learning style of the
learner the process of assessment will become more enjoyable.
Cognitive style is one of the dimensions that affect learning style
[29]. It has been shown to be of significance among students that
fail to persist beyond the first year of university study [64] where
students with a sequential style are more likely to be persistent
than those with a random style. It is important to ensure the
constraints of technology do not bias CAA towards a particular
learning style.
CAA, conducted under supervision, can be used to reduce the
opportunities for cheating. Heron [55] successfully used computer
based assessment to reduce cheating on paper-based assignments.
She reports that requiring students to produce assignment work
under supervision has virtually eliminated cheating on these
assessment tasks. However, such practices do greatly restrict the
types of assignment work that can be set. When we consider that
assignment work and class tests have been found to have the
highest rates of cheating and that students consider these to be
among the most acceptable practices, CAA becomes an important
As technology increases in capacity and functionality the
constraints it imposes reduces. However there are still difficulties
in adapting technology to evaluate higher level learning skills
such as synthesis [36]. It is easy to automate multiple selection
117
tool for the educator [14, 26]. Of further concern here is that a 10year follow up study by Diekhoff [27] found that these forms of
cheating are on the increase.
remembering, understanding and applying knowledge, these
are obviously a good place to start. Courses on topics such
as programming, which benefit from ‘little and often’
assessment and where submissions can be clearly categorized
as right or wrong are also good prospects for CAA. The use
of individualized questions can also help to reduce
plagiarism.
Using computers for assessment has provided new ways in which
students may cheat. For example, Ala-Mutka [1] found that
students generate meaningless comments in program code to
satisfy an automatic style checker requirement for a certain
percentage of comments.
However, using computer-aided
assessment has also provided educators with mechanisms to curb
cheating. For example, providing tests on computer rather than on
paper has also facilitated the generation of instant and individual
sets of questions reducing the opportunity to cheat.
Hunt et al [44] stress the importance of providing students with
assessment tasks that are relevant and appropriate. In support of
this Ashworth [5] found that students will be more inclined to
cheat if they do not see a learning purpose to the assessment.
Furthermore, anxiety produced by an assessment task that is
unfamiliar can induce students to cheat. Providing situationally
authentic assessment tasks could therefore reduce the likelihood
of cheating. For example, online programming examinations or
open book examinations may provide a more realistic assessment
of programming skills and are closer in nature to the actual work
the students have already done and will potentially need in their
future employment. In an evaluation of online programming tests
by Mason and Doit [58], student responses indicated that the
online tests provided a fairer assessment of those who deserved to
pass or fail and that the online tests ‘motivated them not to copy
and to obtain the practical skills expected of them through course
work’ (p.144).
It seems that the issues here are complex. The use of CAA in
certain situations is seen as increasing the risk of cheating and
providing students with different opportunities to cheat.
However, assessing the learner in an electronic environment has
allowed for new ways to address the cheating problem.
4.3
•
Which system to adopt? The choice is between adopting a
third-party system (either a commercial system such as
Blackboard [9] or WebCT [77], or a non-commercial
systems such as BOSS [49] or CourseMaster [41]), and
developing an in-house system. There may be other factors,
such as an institutional decision to adopt a particular system
as a vehicle for course management, but if such a system
does not provide the desired CAA facilities it will be
necessary to develop additional plug-in CAA modules, if this
is possible. Developing an in-house system is the most
expensive option, although it can provide a solution that is
more closely aligned to the institution’s requirements.
Quality of support is an important issue when adopting a
third-party system. Before taking a decision to buy such a
system, it will be necessary to investigate what assessment
facilities it provides and how easy it is to develop new plugin modules for specialist institutional requirements.
•
How much investment will be needed? There are costs
involved in the initial deployment of a system, but there are
also costs associated with staff training, maintenance,
development of additional specialist tools, and development
and testing of questions and marking schemes. The cost of
developing questions can however be reduced by investing in
the creation of a question bank to enable reuse.
•
What effects will a system failure have? CAA requires
reliable systems, where servers are available at all times.
Disruptions due to server downtime, network outages or disk
crashes must be guarded against, and a fallback position
should be designed to mitigate the effects of disruptions
according to the perceived level of risk. Using pilot schemes
can help to discover what effects CAA systems have on
system loading before attempting wide-scale deployment.
•
Are standards important? The simple answer is ‘yes’.
Following IMS interoperability standards will make it easier
to share material with systems at other institutions, and will
also help with migration issues if upgrading to a different
system becomes necessary. However, it is necessary to be
aware of the open-ended nature of such standards and the
possible use of incompatible vendor-specific extensions.
Guidelines for adopting CAA
Adopting CAA can have a number of benefits for both staff and
students. Staff benefit by saving time on marking, particularly
when dealing with large groups, and by being able to assess ‘little
and often’. Students benefit by having the opportunity to work at
their own pace, and to receive immediate feedback on the work
that they submit. Other benefits may include reducing plagiarism
and using data-mining techniques to discover trends in student
submissions. However, making a decision to adopt CAA
techniques requires careful preparation. Some of the factors to be
considered are:
•
What types of assessment are possible? CAA is still in its
infancy, and most existing systems provide little more than
multiple-choice questions, which are really only suitable for
assessing the lowest-order skills. Systems for automatic
assessment of programming assignments are also fairly
common, although many other assessment types are possible.
As CAA matures, these will undoubtedly become more
widely accepted.
•
Which courses can benefit? Mass courses with large
enrolments will yield the greatest savings in terms of
marking time, and lower-order skills are easiest to assess.
Since first-year courses generally have the highest numbers
and also concentrate more on lower-levels skills such as
We should not forget that the primary beneficiaries of CAA
systems should be the students. The system as a whole must be
based on sound pedagogical principles and must avoid
discrimination, or exclusion, based upon gender, disabilities or
other factors. Careful design of any CAA system is necessary to
ensure that this is so.
5
ACKNOWLEDGMENTS
The members of the working group would like to thank Petco
Tsvetinov, from the Queesnland University of Technology, who
helped in the data collection and preparation, but could not attend
the conference in Thessaloniki where this report was prepared.
118
We would also like to thank Andrew Solomon for sharing his
CAA experiences with us.
6
Student’s Learning Outcomes: an empirical study,
Assessment & Evaluation in Higher Education, 27(6), 2002
[16] Christie JR, Automated Essay Marking for both Style and
REFERENCES
Content, Proceedings of 3rd Annual CAA Conference,
Loughborough, 1999
[1] Ala-Mutka K, Computer-assisted Software Engineering
Courses. Proceedings of IASTED International Conference
Computers and Advanced Technology in Education, Cancun,
2002
[17] Clariana R, Wallace P, Paper-based versus computer-based
assessment: key factors associated with the test mode effect,
British Journal of Educational Technology, 33, 2002
[2] Anderson W, Krathwohl D, A Taxonomy for Learning,
[18] Cook J, Leatherwood C, Oriogun P, Online Conferencing
Teaching and assessing,
http://www.cours.polymtl.ca/plu6035/PDF/anderson.pdf,
2001
with Multimedia Students: Monitoring Gender Participation
and Promoting Critical Debate, Proceedings of 2nd Annual
LTSN-ICS Conference, London, 2001
[3] Arnow D, Barshay O, Online Programming Examinations
[19] Cox K, Clark D, The use of formative quizzes for deep
using WebToTeach, Proceedings of ITiCSE’99, Krakow,
1999
learning, Computers and Education, 30, 1998
[20] Culwin F, MacLeod A, Lancaster T, Source Code Plagiarism
[4] Arnow D, Barshay O, WebToTeach: an interactive focused
in UK HE Computing Schools, Issues, Attitudes and Tools,
Southbank University, London, Commissioned by JISC,
2001
programming exercise system, 29th Annual IEEE Frontiers in
Education Conference, 1999
[5] Ashworth P, Bannister P, Thorne P, Guilty in whose eyes?
[21] Daly C, RoboProf and an Introductory Programming
University students' perceptions of cheating and plagiarism
in academic work and assessment, Studies in Higher
Education, 22, 1997
Course, Proceedings of ITiCSE’99, Krakow, 1999
[22] Dalziel J, Gazzard S, Next generation computer assisted
assessment software: the design and implementation of
WebMCQ, Proceedings of 3rd Annual CAA Conference,
Loughborough 1999
[6] Barnett DC, Dalton JC, Why college students cheat? Journal
of College Student Personnel, 22, 1981
[7] Bedny GZ, Seglin MH, Meister D, Activity theory: history,
[23] Davies P, Computer Aided Assessment MUST be more than
research and application, Theoretical Issues in Ergonomics
Science, 1(2), 2000
multiple-choice tests for it to be academically credible?
Proceedings of 5th Annual CAA Conference, Loughborough
2001
[8] Beevers E, Paterson JS, Automatic assessment of problemsolving skills in mathematics, Active Learning in Higher
Education, 4(2), 2003
[24] Davis H, Carr L, Cooke E, White S, Managing Diversity:
Experiences Teaching Programming Principles in
Proceedings of 2nd Annual LTSN-ICS Conference, London,
2001
[9] Blackboard, http://www.blackboard.com/
[10] Bloom B, Taxonomy of Educational Objectives: The
[25] Delandshire G, Implicit theories, unexamined assumptions
Classification of Educational Goals: Handbook I, Longman,
New York, 1956
and the status quo of educational assessment, Assessment in
Education, 8(2), 2001
[11] Buck D, Stucki DJ, Design Early considered Harmful:
[26] Dick M, Sheard J, Markham S, Is it OK to cheat?
Graduated Exposure to Complexity and Structure Based on
Levels of Cognitive Development, SIGCSE 2000, Austin,
2000
Proceedings of ITiCSE’01, Canterbury, 2001
[27] Diekhoff GM, LaBeff EE, Clark RE, Williams LE, Francis
B, and Haines VJ, College cheating: ten years later,
Research in Higher Education, 37, 1996
[12] Burstein J, Leacock C, Swartz R, Automated Evaluation of
th
Essays and Short Answers, Proceedings of 5 Annual CAA
Conference, Loughborough, 2001
[28] Duke-Williams E, King T, Using Computer-Aided
Assessment to Test Higher Level Learning Outcomes,
Proceedings of 5th Annual CAA Conference, Loughborough,
2001
[13] Carneson J, Delpierre G, Masters K, Designing and
Managing Multiple Choice Questions,
http://www.le.ac.uk/cc/ltg/castle/resources/mcqman/mcqman
01.html, 1999
[29] Dunn R, Beaudy J, Kalvan A, Survey of research on learning
styles, Education Leadership, 46, 1989
[14] Carpenter DD, Harding TS, Montgomery SM, Steneck N,
P.A.C.E.S – A study on academic integrity among
engineering undergraduates (preliminary conclusions),
ASEE Annual Conference & Exposition, Montreal, 2002
[30] English J, Siviter P, Experience with an Automatically
Assessed Course, Proceedings of ITiCSE’00, Helsinki, 2000
[31] English J, Experience with a Computer-Assisted Formal
[15] Chan CC, Tsui MS, Chan MYC, Applying the Structure of
Programming Examination, in Proceedings of ITiCSE’02,
Aarhus, 2002
the Observed Learning Outcomes (SOLO) Taxonomy on
119
[32] Farthing DW, Jones DM, McPhee D, Permutational
[50] JPLAG, http://www.jplag.de/
Multiple-Choice Questions: An Objective and Efficient
Alternative to Essay-Type Examination Questions,
Proceedings of ITiCSE’98, Dublin, 1998
[51] Kashy E, Tsai Y, Thoenssen M, Morrissey D, CAPA, An
Integrated Computer Assisted Personal Assignment System,
American Journal of Physics, 61(12), 1993
[33] Farthing DW, McPhee D, Multiple choice for honours-level
[52] Korhonen A, Malmi L, Algorithm simulation with automatic
students? A statistical evaluation, Proceedings of 3rd Annual
CAA Conference, Loughborough, 1999
assessment, Proceedings of ITiCSE’00, Helsinki, 2000
[53] Kumar A, Learning the Interaction Between Pointers and
[34] Freeman M, McKenzie, SPARK, a confidential web-based
Scope in C++, ITiCSE ’01, Canterbury, 2001
template for self and peer assessment of student teamwork:
benefits of evaluating across different subjects, British
Journal of Educational Technology, 33, 2002
[54] Lee S, Development of instructional strategy of computer
application software for group instruction in Computers and
Education, 37(1), 2001
[35] Frosini G, Lazzerini B, Marcelloni F, Performing automatic
exams, Computers & Education, 31, 1998
[55] Le Heron J, Plagiarism, learning dishonest or just plain
cheating: The context and countermeasures in information
systems teaching, Australian Journal of Educational
Technology, 17, 2001
[36] Fuller M, Assessment for real in Virtual Learning
Environments – how far can we go? Proceedings of Interface
- Virtual Learning and Higher Education, Mansfield College,
Oxford 2002
[56] Lister R, Objectives and Objective Assessment in CS1,
Proceedings of SIGCSE ’01, Charlotte, 2001
[37] Fuller U, Slater J, Tardivel G, Virtual Seminars, Real
Networked Results? Proceedings of ITiCSE’98, Dublin, 1998
[57] Malmi L, Korhonen A, Saikkonen R, Experiences in
Automatic Assessment on Mass Courses, Proceedings of
ITiCSE’02, Aarhus, 2002
[38] Hagan D, Sheard J, Monitoring and Evaluating a Redesigned
First Year Programming Course, Proceedings of ITiCSE’97,
Uppsala, 1997
[58] Mason DV, and Woit DM, Integrating technology into
computer science education. Proceedings of 29th SIGCSE
Technical Symposium on Computer Science Education,
Atlanta, 1998
[39] Hargreaves DJ, Student learning and assessment are
inextricably linked, European Journal of Engineering
Education 22(4), 1997
[59] Mason O, Grove-Stephenson I, Automated free text marking
[40] Hatzimoysis A, A Philosophy Primer in Virtual Seminars
with Paperless School, Proceedings of 6th Annual CAA
Conference, Loughborough 2002
http://www.prsltsn.leeds.ac.uk/philosophy/events/hatzimoysis1.html, 2002
[60] Medley MD, Online Finals for CS1 and CS2, Proceedings of
[41] Higgins C, Symeonidis P, Tsintifas A, The Marking System
ITiCSE’98, Dublin, 1998
for CourseMaster, Proceedings of ITiCSE’02, Aarhus, 2002
[61] MOSS, http://www.cs.berkley.edu/~aiken/moss.html
[42] Hogarth S, Virtual Seminars in Psychology and Computing
http://ctipsy.york.ac.uk/aster/resources/case_studies/reports/yk_06/yk
_06.html, 2002
[62] Mulligan B, Pilot study on the impact of frequent
computerized assessment on student work rates, Proceedings
of 3rd Annual CAA Conference, Loughborough, 1999
[43] Hoggarth G, Lockyer M, An Automated Student Diagram
[63] Question Mark, http://www.questionmark.com/
Assessment System, Proceedings of ITiCSE’98, Dublin, 1998
[64] Ross JL, Drysdale MTB, & Schulz RA, Cognitive learning
[44] Hunt N, Hughes J, Rowe G, Formative automated computer
styles and academic performance in two postsecondary
computer courses. Journal of Research on Computing in
Education, 33(4), 2001
testing (FACT), British Journal of Educational Technology,
33, 2002
[45] IMS http://www.imsglobal.org
[65] Saikkonen R, Malmi L, Korhonen A, Fully Automatic
Assessment of Programming Exercises, Proceedings of
ITiCSE’01, Canterbury, 2001
[46] Jackson D, A Semi-Automated Approach to Online
Assessment, Proceedings of ITiCSE’00, Helsinki, 2000
[66] Shimatani H, Kitao K, Computer Aided Instruction: A
[47] Jackson D, A Software System for Grading Student Computer
bibliography,
http://ilc2.doshisha.ac.jp/users/kkitao/library/biblio/caibib.htm
Programs, Computers & Education, 27(3/4), 1996
[48] Janson S, Gender and the Information Society: A Socially
Structured Silence, in Siefert M, Gerbner G, Fisher J (Eds.),
The Information Gap: How Computers and Other
Communication Technologies Affect the Social Distribution
of Power, Oxford: Oxford University Press, 1989
[67] Simon, Summons P, Automated testing of Databases and
Spreadsheets – the Long and the Short of it, Proceedings of
ACE2000, Melbourne, 2000
[49] Joy M, Luck M, Effective Electronic Marking for Online
Assessment, Proceedings of ITiCSE’98, Dublin, 1998
120
[68] Sloman J, Use of virtual seminars in Economic Principles,
[75] Trentin G, Computerized adaptive tests and formative
http://www.economics.ltsn.ac.uk/showcase/sloman_virtual.ht
m, 2002
assessment, Journal of Educational Multimedia and
Hypermedia, 6, 1997
[69] Smailes J, Experiences of using Computer Aided Assessment
[76] Ward A, Using peer assessment assisted by ICT for
within a Virtual Learning Environment,
http://www.business.ltsn.ac.uk/events/BEST
2002/Papers/smailes.PDF, 2002
programming assignments, Interactions, 5(1), 2001
[77] WebCT, http://www.webct.com/
[78] White S, Davis H, Creating large-scale test banks: a briefing
[70] Solomon A, http://www-staff.it.uts.edu.au/~andrews/
for participative discussion of issues and agendas,
Proceedings of 4th Annual CAA Conference, Loughborough
2000
personal communication, 2003
[71] Thelwall M, Computer-based assessment: a versatile
educational tool, Computers & Education 34(1), 2000
[79] Whittington D, Hunt H, Approaches to the computerized
assessment of free text responses, Proceedings of 3rd Annual
CAA Conference, Loughborough 1999
[72] Thomas P, Price B, Petre M, Carswell L, Richards M,
Experiments with Electronic Examinations over the Internet,
Proceedings of 5th Annual CAA Conference, Loughborough
2001
[80] Wood D, Aspects of Teaching and Learning, in Light P,
Sheldon S and Woodhead M (eds.) Learning to Think,
Routledge, London, 1991
[73] Thomas P, Price B, Paine C, Richards M, Remote electronic
examinations: student experiences, British Journal of
Educational Technology, 33, 2002
[74] Thorburn G, Rowe G, PASS: An Automated System for
Program Assessment, Computers & Education, 29(4), 1997
121
APPENDIX – THE QUESTIONNAIRE
This survey is about the use of different assessment methods in Computing courses and attitudes towards the use of
computer-aided assessment. The results of this survey will be used to inform the results of the "How shall we assess this?"
working group on assessment techniques at ITiCSE 2003. We would appreciate it if you would take a few minutes to fill in
this questionnaire.
About you:
This survey is for statistical purposes only, and will not be analysed with respect to individuals, institutions or geographical
areas. However, we would like you to specify your gender, country and teaching experience, and you may also provide the
name of your institution and/or email address if you wish.
Your gender
Country
Teaching experience (years)
Institution (optional)
Email address (optional)
Assessment:
Questions in this section relate to a specific course that you teach or is currently taught within your department. By "course",
we mean an assessable unit of teaching (known variously as "units", "subjects" or "modules" in some institutions). If you
wish to provide information about more than one course, please feel free to fill in multiple copies of this survey!
(please select)
Course topic:
(Please pick the topic that is the closest match to the
course that you teach, or choose "other" if there isn't
If “other” please specify: _____________________
any suitable topic)
(please select)
Course level:
The questions in this grid relate to assessments associated with the course above. Please fill in as many columns as are
relevant, whether they relate to separate assessment tasks or to different aspects of a single assessment task.
Other
ClosedOpenInPractical
Essay
written
book
book
class Presentation Other
Kind of assessment task:
work
exercise
exam
exam
test
Formative (i.e. for
feedback
Assessment
purposes)
type
Summative (i.e.
for grading
purposes)
Fixed set of
questions
Choice of
questions
Exercise type
Individually
tailored questions
Group work
Remembering
Understanding
Aspects of
Application
learning
Problem-solving
covered
(synthesis)
Evaluation
Submission
Manual
mechanism
Electronic
Marking
Manual
122
techniques used
To what extent
is plagiarism a
problem?
Part manual, part
electronic
Fully electronic
Peer assessed
By interview
Not at all
Minor problem
Moderate
problem
Major problem
Don't know
What steps (if any) do you take to detect and/or prevent plagiarism?
Computer-Aided Assessment:
Do you have any experience (past or present) of using online learning environments?
• Yes / No
• If so, which?
Have you ever used computer-aided assessment?
• No / Yes, a little / Yes, a lot
Is computer-aided assessment used within your department?
• No / Yes, a little / Yes, a lot
Would you use computer-aided assessment if (please tick as many boxes as are relevant):
• You had to create the test and computerise it yourself
• You had to create the test and someone else would computerise it
• Someone else would create the test and computerise it
The grid below relates to your perceptions of computer-aided assessment (CAA). Please complete it even if you have no
experience of using CAA.
Strongly
Strongly
Disagree Neutral Agree
agree
disagree
CAA has fewer security risks than manual assessment
CAA is more time-consuming than manual assessment
CAA reduces marking time
It is possible to test higher-order learning using CAA
CAA offers greater objectivity in marking
CAA allows students to work at their own pace and
more flexibly
The use of CAA makes students more anxious
CAA improves the immediacy of feedback to students
CAA improves the quality of feedback to students
CAA disadvantages special-needs students
What do you perceive as the advantages of CAA?
What do you perceive as the disadvantages of CAA?
When using CAA, do you consider it important to be able to share and swap questions and exercises with others?
• No
• Yes, with other institutions within the same country
• Yes, with other institutions in other countries
123