Kolin Kolistelut - Koli Calling 2004
Paper R/05
1
Cognitive skills of experienced software developer: Delphi study
S. Surakka and L. Malmi
Helsinki University of Technology, Laboratory of Information Processing Science,
P. O. Box 5400, FIN-02105 HUT, Finland
[email protected] and
[email protected]
Abstract
In this paper a qualitative study of cognitive skills of experienced software developers is
presented. The data for the study was gathered using the Delphi method. The respondents
were 11 software developers who have worked at least five years after their graduation.
The respondents were found using recommendations since the goal was to find especially
good software developers. Thus, they are not a statistically representative sample from
all software developers but more like a focus group. Two questionnaire rounds were
conducted. In the first round, the respondents mentioned altogether 32 different skills. In
the second round, 10 of the respondents answered and evaluated the importance of these
32 skills. The results are divided into two categories: composition and comprehension.
For each skill, the evaluated degree of difficulty of the skill is presented (e.g., does the skill
efficiently differentiate experts from novices).
1
Introduction
What are cognitive skills? According to ERIC Thesaurus (2004), the term ‘thinking skills’
should be used for the term ‘cognitive skills.’ The description for the term ‘thinking skills’ is
the following:
Interrelated, generally “higher-order” cognitive skills that enable human beings
to comprehend experiences and information, apply knowledge, express complex
concepts, make decisions, criticize and revise unsuitable constructs, and solve
problems—used frequently for a cognitive approach to learning that views explicit
“thinking skills” at the teachable level.
In this study the goal has been to identify cognitive skills that are important for expert
software developers’ work. Our research origins from the need to better understand what
kind of topics and skills should be included in the Masters level education of software systems
specialists in the Helsinki University of Technology. Typical sources for such curriculum development work include various model curriculums such as Computing Curricula 2001 (Engel
and Roberts, 2001). However, they mostly concentrate on listing topics to be covered in the
curriculum. The skills to be achieved during the education are covered more vaguely. Since
programming is a high-level cognitive skill, we wanted to find out in some more detail what
kind of cognitive skills should be trained in the education.
We decided to search for high-level software development experts and ask from them
which topics in computer science they consider important for their work. Moreover, we were
interested in identifying tacit knowledge needed in software development. Since such information is difficult to be grasped with simple questionnaires we decided to apply the Delphi
method (Wilhelm, 2001) in which people in the same focus group are queried two or more
times. After each time a summary of results is presented for them followed by more closely
defined questions of the topic of interest. Delphi is a qualitative research method, where the
quality rather than the number of respondents is the more important factor. The statistical
reliability of the results is therefore not the general goal, and thus the number of respondents
need not be very large. In this study we selected, based on some general quality criteria,
11 respondents among a group of recommended 59 experts. Two questionnaire rounds were
performed, and the second round concentrated especially on the tacit knowledge of software
development. In this paper we concentrate on the results of the second questionnaire round.
2
Kolin Kolistelut - Koli Calling 2004
The structure of the paper is the following. First, we consider some related work in
Section 2. In Section 3 we describe the research method in some detail. The results are
presented and analyzed in Section 4. A discussion including some implications to education
and evaluation of this research summarizes the paper.
2
Related work
We did not find any research papers where the Delphi method has been used in the field of
psychology of programming. This is understandable because it is not common to use even
questionnaires as a research instrument in this field.1 Because the lack of similar research,
some more general references are presented next. In the end of this section it is explained
how these issues relate to our research.
Greeno and Simon (1988) wrote ‘Computer programming may be characterized “as a
whole” as a design task.’ Brooks (1983) wrote about design task domains:
. . . , two fundamental activities in design task domains are composition and comprehension. Composition is the development of a design and comprehension results in an understanding of a design. The essence of the composition task in
programming is to map a description of what the program is to accomplish, in the
language of real-world problem domains, into a detailed list of instructions to the
computer designating exactly how to accomplish those goals in the programming
language domain Comprehension of a program may be viewed as the reverse series
of transformations from how to what.
Stanislaw et al. (1994) divided expertise in computer programming into two components
that were time-based expertise and multiskilling expertise. They wrote (p. 351): ‘Timebased expertise corresponds to the conventional notion of expertise, and is a function solely
of the time spent on programming. Multiskilling expertise, by contrast, accrues through
exposure to a variety of programming languages and tasks, and is related to the cognitive
development of higher-level programming schemata.’ Detienne (2002, p. 35) wrote that one
of the characteristics that distinguishes ‘super experts’ or ‘exceptional designers’ from other
experts is: ‘a broader rather than longer experience: the number of projects in which they
have been involved, the number and variety of the programming languages they know.’
In addition, Detienne (2002, p. 35) wrote that experts carry out some aspects of programming task completely automatically. She refereed to Wiedenbeck (1985, p. 383) who found
that experts were faster and did fewer mistakes than novices when both groups had to do a
series of timed true/false decisions about short, textbook-type program segments. One might
assume that, for example, the following skills are automated gradually when the programming experience increases: (a) using basic commands of an editor (such as Emacs) and the
programming system frequently used, and (b) knowing details of syntax and code conventions
of a certain programming language such as C.
The previous issues relate to this study as follows: (a) We have used the two activities,
composition and comprehension, to interpret and divide our results. (b) The division timebased expertise vs. multi-skilling expertise was used so that we required that at least half
of the respondents should be characterized as multi-skilled experts. (c) The concept of skill
automation was used with the questions about cognitive skills: the first question concerned
higher-level skills and the second question concerned skills that might be partially or totally
automated.
1
We found only seven articles where questionnaire has been used, for example (Capretz, 2003). However,
none of these articles is really related to our study beside the use of questionnaires.
Kolin Kolistelut - Koli Calling 2004
3
Paper R/05
3
Method
An overview of the Delphi method can be found, for example, from (Wilhelm, 2001). The
method was originally used to forecast the future; the name originates from ‘the oracles of
Delphi’ where Delphi refers to an ancient Greek island. However, in this study, estimating
future was only a small part.
Some basic properties of the method are the following. First, there are several questionnaire rounds. Second, the results from the previous round are used as material for the next
round. Thus the respondents may change or tune their previous answers. One of the main
reasons for using Delphi was that it allows group communication without gathering all respondents to the same place in the same time, which in this case would have been very difficult to
achieve. Moreover, in this way the respondents had more time to consider their answers and
make their views more explicit.
Originally consensus building has been an important part of the Delphi method. In this
research, however, the second questionnaire round was not used for building consensus on
the whole issue but targeted more to refining the results of an interesting part of the first
questionnaire; that is, cognitive skills. The first questionnaire had three open questions about
cognitive skills required by a software specialist. Based on the answers in total 36 different
skills were identified. In the second round the respondents defined the level of these skills,
that is, how long learning and experience is needed before such a skill is mastered. The
questionnaires are presented in more detail in Section 3.2.
The decision of limiting the second questionnaire to only one area of interest was based on
several reasons: (a) The results from the other areas of the first questionnaire were satisfactory
enough. Thus, the need to conduct a second questionnaire round for the sake of the other areas
was low, (b) The respondents thought that the questions about cognitive skills were the most
difficult to answer. We interpreted this as a hint to explore more this area, (c) Regardless of
the answering difficulties, some respondents thought cognitive skills as interesting or promising
area for this kind of study. This was our own opinion, as well, and finally, (d) In the beginning
of the study we promised to the respondents that participating would take 1-3 hours, and we
wished not to break this promise.
After the cognitive skills were chosen as the topic for the second questionnaire round, the
goal was set to evaluate how demanding or difficult the different cognitive skills that were
mentioned during the first round are.
3.1
Finding respondents
The goal was to find 10-20 especially good software developers. The respondents were found
using recommendations. Thus, they are not a statistically representative sample from all
software developers but more like a focus group. Probabilistic sampling was not used because
it was difficult to identify the target group using properties such as age, education, and title.
For example, the title and working years are not enough to separate especially good software
developers from poor or intermediate developers. Our decision thus fits well with guidelines
presented by Kitchenham and Pfleeger (2002, p. 19): ‘Nevertheless, there are three reasons
for using non-probability samples: 1. The target population is hard to identify. For example,
if we want to survey software hackers, they may be difficult to find. . . ’
The minimum criteria were a degree, five years working experience after graduation, at
least half of time used to programming during these five years, and at least 100,000 lines
of self implemented code. In addition, at least half of the respondents should have versatile
software development experience. Here, versatile means different kind of projects, for example
various programming languages and application domains. Two extra criteria were that (a)
maximum of three respondents can be included from the same organization and (b) only one
respondent can work full-time at the Helsinki University of Technology, where the authors work
themselves. The degree could be from other programs than computer science and engineering.
4
Kolin Kolistelut - Koli Calling 2004
For example, some older respondents had the degree from electrical engineering. The title of
the respondent needed not be programmer, software developer or software engineer, since the
important issue was only that their work included enough programming.
Altogether, 59 persons were recommended. 40 of them were not asked because of several
different reasons (e.g., the person was graduated less than five years ago). Thus, 19 persons
were asked to participate starting from those who had more recommendations. From these
19 persons, 11 promised to participate.
The criterion of at least 100,000 lines of self-implemented code and enough programming
experience during the last five years were checked when the person was asked to take part.
Some candidates declined because of these two conditions. The criterion of at least half of
the respondents should have versatile software development experience was controlled with
the first questionnaire. No respondents were excluded because of this criterion.
3.2
Questionnaire rounds
Two questionnaire rounds were conducted. The first questionnaire was answered between
November 2003 and January 2004, the second questionnaire between January and February
2004. During the first round, most respondents answered so that they were able to ask
questions from the researcher (from one of us) who was present during they answered. The
researcher was not present during answering on the second round. The mean answering time
for the first round was one hour and six minutes, and 54 minutes for the second round. The
original questionnaires are available in Finnish only at (Surakka, 2004). However, their main
properties are presented in the following two subsections.
3.2.1
First questionnaire
The first questionnaire had 14 open questions and 14 multiple-choice questions. The topics
were (a) background information from the respondent, (b) the importance of various subjects
and skills for software development, such as discrete mathematics and concurrent programming, (c) cognitive skills, (d) problem solving techniques, and (e) software quality. For brevity,
only results about the background information and cognitive skills are presented in this article.
The questions about background information were title, proportion of time used to programming, number of employees under the respondent, lines of code implemented by the
respondent, number of different groups involved, number of different projects, personal skills
in various subjects (42 subitems such as discrete mathematics and object-oriented programming), skills in various programming languages and knowledge of various operating systems.
Instead of cognitive skills, the term ’tacit knowledge’ was used because we assumed that it
would be easier to understand for the respondents. An explanation of the concept including
initial division to cognitive skills and technical skills was given before the questions. The three
questions were:
• For top-level software developer, what are important mental models, beliefs and understanding that belong to the cognitive element of tacit knowledge?
• For top-level software developer, what topics or skills belong to the technical element of
tacit knowledge? This can also be called as skills that are located in the fingertips.
• Do you believe that some area of tacit knowledge will be more important in the future?
3.2.2
Second questionnaire
The second questionnaire was based on the respondents’ answers and comments to the first
questionnaire. These were analyzed to identify and separate different skills mentioned in the
comments. Comments clearly denoting the same skill were joined. Typing skill was included
Kolin Kolistelut - Koli Calling 2004
Paper R/05
5
into the list, based on researcher’s observations, even though the respondents did not mention
it. Finally we had a list of 36 comments each identifying at least one skill, for the next round.
In the second questionnaire, the respondents had to evaluate the level of these comments
according to the following categories:
1. Very low-level skill that even novices can learn quickly (during a 1-4 credits basic course)
2. Somewhat low-level skill that requires working experience of 3-6 months to be learned,
for example
3. Somewhat high-level skill that starts to differentiate good programmers from less good
programmers
4. Very high-level skill that takes usually several years to learn and typically only top-level
programmers have this skill.
The second questionnaire also had questions about problem solving techniques, use of
editor, and typing skills. For brevity, these results are not reported in this article.
4
Results
First, some background information about respondents is presented. Second, the results about
respondents’ opinions from cognitive skills are presented.
4.1
Background information of respondents
All respondents were male and mean of respondents’ ages was 37.1 years. Their degrees
were as follows: one college degree in computer science and engineering (9%), five masters
in computer science and engineering (45%), three masters in other engineering disciplines
(27%), one doctor from applied mathematics (9%) and one doctor from computer science and
engineering (9%). The respondents’ positions were distributed into following groups: senior
software engineers and developers 45%, researchers 27%, and managers or directors 27%.
Each respondent was asked to give himself a grade in 42 subjects or skills related to various
fields of computer science, or other sciences (mathematics, physics), and software development
phases of the waterfall model. In Table 1 are shown the ten subjects or skills that respondents
evaluated they knew best on average. There are two issues that are worth noticing. First,
script programming skills are ranked very high. This obviously correlates with the heavy use
of Unix/Linux environment in their work. We did not ask more questions on scripting on the
second round. However, our interpretation of this phenomenon is that for this target group
scripting is a regular method for solving simple computational problems, for example, filtering
and manipulating data files, or building auxiliary tools for them. This is strongly related with
the important cognitive skills of recognizing the need for building new tools and choosing a
suitable tool for each purpose.
The second observation is that functional programming is ranked much higher than the
general use of functional programming languages in software production would indicate. We
believe that this is related to multi-skilling. A plausible explanation is that many of the
respondents have used functional programming during the career and/or hobby programming. Based on answers to the open question about working experience, at least four (36%)
respondents had actually used Lisp in some work project.2
2
Nine (82%) respondents have graduated from the Helsinki University of Technology where Scheme was the
language of the first compulsory programming course in the degree program of computer science and engineering
(CSE) during 1989-2003. However, this is not a suitable explanation because all these nine respondents were
admitted before 1989 or were from other degree programs than CSE. That is, the course in question was not
compulsory for them.
6
Kolin Kolistelut - Koli Calling 2004
Table 1: Respondents’ top strengths according to question ‘Give yourself a grade in the
following subjects or skills’ (scale: 1 poor . . . 4 excellent).
Rank
1
3
5
7
10
4.2
Subject or skill
Implementation
Procedural programming
Data structures and algorithms
Script programming
Design
Object-oriented programming
Operating systems
Testing
Version and configuration management
Functional programming
Mean
3.8
3.8
3.5
3.5
3.4
3.4
3.1
3.1
3.1
3.0
Respondents opinions about cognitive skills
In the second questionnaire, the statements of skills were divided according to the division
used in the first questionnaire. However, for this article we reclassified the results into two categories: composition and comprehension. We also combined some comments. Two comments
are not presented in the tables because they are not related only to software development.
These two comments and their means were Being systematic 2.1 and Ability to type using
ten fingers 2.1. Thus, the tables contain fewer comments than the second questionnaire did.
First, the results related to composition are presented in Table 2. The comments are
ordered according to the means. The numbers in the leftmost column are used for commenting
the items.
Even though statistical analysis was not our main purpose, we were curious to see, whether
the observed differences are significant or not. We used the Mann-Whitney test (Conover,
1999, pp. 271-275) for the analysis because this nonparametric test is suitable for small samples. Note that the test compares the ranks, not the means. However, for brevity we present
the test results in the same column with the means. The ranks of single items were compared
to the ranks of all items. A star (*) indicates that the difference is statistically significant
(p<0.01). If the star is missing, the difference is not statistically significant.
In Table 2, there are a few observations which need commenting. First, the high mean
of item “2a Automating one’s own work using scripts, keyboard macros etc.” obviously does
not indicate the time needed to learn such skills. Instead, it indicates the time needed to
use them efficiently as one’s personal tools, when necessary. Our assumption is that this is a
skill which is analogous to bottom-up software design, where the programmer recognizes the
need for general-purpose procedures and data structures. Thus, it has a role in differentiating
excellent developers from others. Second, the items ‘Design of interfaces’ and ‘Isolating the
implementation behind well defined (and documented) interfaces’ are kept separate. The first
one is more associated with designing and the latter one with using interfaces. It is obviously
easier to learn to use ready-made interfaces properly than actually designing interfaces that
support good software architecture. Third, comments 2b and 7b are similar but we think that
2b is broader than 7b. Comment 2b includes also low-level knowledge, for example knowing
language’s keywords by heart. Forth, we think that the low ranked items 15a and 17 are not
really cognitive skills, but other kind skills or knowledge. However, we have not omitted these
items from the table because they are related to composition.
In Table 3 we present the results related to category ‘comprehension’. As a general note,
it is interesting that the respondents have used often words like ‘see’ and ‘notice’ to describe
Kolin Kolistelut - Koli Calling 2004
Paper R/05
7
Table 2: Comments classified into category ‘Composition’: Means to question ‘What do you
think is the level of this skill?’ Scale was: 1 very low-level skill. . . 4 very high-level skill.
Number
1
Mean
3.6*
2a
2b
4
5
6
7a
7b
3.5*
3.5*
3.4
3.3
3.1
3.0
3.0
9
10a
10b
12
13
14
15a
15b
17
A star
Comment
A good programmer has always a model. The code itself comes from
spine and brains operate only the model.
Automating ones own work using scripts, keyboard macros etc.
Mastery of a certain programming language or a certain environment
Writing code so well that it is not even necessary to comment
Design of interfaces
Choosing as optimal data structures and algorithms as possible
Ability to find right abstractions
Mastery of the structures and idioms that are characteristic for each
language or environment
Ability to write code clearly and shortly
Choice of the programming language
Implementing programs as independent from the operating
environment as possible
Isolating the implementation behind well-defined
(and documented) interfaces
Changing lower level cognitive models/design patterns to code.
For example, table field in C/C++ object and its memory
management get/set/constr/destr.
Identifying concepts
Ability to find existing Open Source solutions from Net
and being familiar with libraries
Procedural or object-oriented way of thinking about programming
Documenting code
(*) indicates that the difference is statistically significant (p<0.01).
2.9
2.8
2.8
2.7
2.6
2.4
2.3
2.3
1.9*
these skills. We think that item ‘13 Understanding the function of programming languages
and computer (e.g., parameter passing, order of execution, and concurrency)’ is rather explicit
than tacit knowledge.
5
Discussion
In this section conclusions are drawn, implications to education are presented, and the research
is evaluated.
5.1
Conclusions and implications to education
The skills listed can be divided into two main categories: skills associated with composition
and skills associated with comprehension. The composition category obviously includes skills
that are related to the mastery of the programming languages and environments used. Other
important skills associate with having an inherent model of the goal in one’s mind, designing
interfaces and abstractions, mastering and developing one’s own working process, for example.
The comprehension category includes skills such as understanding the program as whole,
ability to notice isomorfisms with other known problems, ability of change fluently view to
the code in various aspects, for example.
On a general level, the results confirm that different comprehension-related tasks are an
8
Kolin Kolistelut - Koli Calling 2004
Table 3: Comments classified into category ‘Comprehension’: Means to question ‘What do
you think is the level of this skill?’ Scale was: 1 very low-level skill . . . 4 very high-level skill.
Number
1
Mean
3.9*
2
3
3.6
3.5
4a
4c
6a
6b
8a
8b
10
11
12
13
A star
Comment
Ability to see all possible alternatives from the source code (this
comment was related to debugging)
Ability to notice isomorfisms with some known problem
Ability to evaluate how the system will operate even before its
implementation has been started
Ability to see esthetic values in solutions
Ability to see the big picture. What is the core of the problem and
how it is connected to the environment around it?
Ability to distinguish essential matters
Interpreting the program as whole
Ability to change fluently
- abstraction level (e.g., single line of code vs. procedure or big picture
vs. details),
- perspective (e.g., is the control flow or the data flow of the program
examined),
- concepts (e.g., are the concepts of program or the concepts of
application domain considered)
- and view (e.g., users needs vs. maintenance vs. development speed).
Ability to debug
Ability to see symmetries
Exploring the architecture of the existing systems
Ability to see a big problem as several partial problems
Understanding the functioning of programming languages and computer
(e.g., parameter passing, order of execution, and concurrency)
(*) indicates that the difference is statistically significant (p<0.01).
3.4
3.4
3.2
3.2
3.1
3.1
3.0
2.9
2.7
1.8*
important part of software developer’s cognitive skills. Approximately 40% of the items
mentioned by the respondents can be classified as comprehension-related tasks. Obviously,
this is not at all surprising result because according to the definition presented in the very
beginning of this article, cognitive skills enable human beings to comprehend information.
It is obvious that many of the skills listed above cannot be taught directly on the courses.
They are highly related with a long experience gathered when programming solutions to different problems. The challenge for education is to design project assignments where students
will face problems, in which the mentioned skills are useful, and how to present guidelines for
adopting such skills.
On a more general level, we assume that the deployment of the results of this research might
increase the proportion of time used into concept exploration, requirements analysis, and
design phases but decrease the proportion of time used into implementation phase. For brevity,
we mention only two course examples of such development. The first example would be an
advanced course that emphasize comprehension. A possible course title could be ‘Refactoring.’
During a refactoring course, a student should repair and/or partly rewrite a program (maybe
2000-3000 lines) that contains different kind of mistakes and bad planning choices. During
the task, a student has to read and thus comprehend a program written by others. Moreover,
he/she should argue about the findings made, and how the code should be improved.
Second, from the composition viewpoint a possible course title could be ‘Software design
Kolin Kolistelut - Koli Calling 2004
Paper R/05
9
workshop.’ This course would emphasize analyzing and decision-making skills related to
design. The course would contain an open or semi-open design problem that can be solved
using several different strategies and tools. The student group should compare various options,
argue their pros and cons, and finally evaluate the result.
5.2
Evaluation of the research
This study would have been very different if the original main goal was to gather information
from cognitive skills of software developers. Questionnaires are used seldom in psychology
of programming where experimental research setting is dominant. One source of criticism is
that questionnaires measure opinions, not observable behavior. However, in this research the
purpose was to measure especially the opinions of experts.
During the first questionnaire round, most respondents commented that the questions
about the tacit knowledge were the most difficult to answer. A possible interpretation could
be that the used research method was not suitable or the questions were poorly designed.
However, we interpreted that the answering difficulties were mainly due from the topic itself;
that is, the topic is genuinely difficult.
It is possible that the respondents do not remember or cannot describe skills that have
been automated already several years ago. For example, adults often have difficulties to
describe how bicycle is ridden or car is driven. We tried to minimize this problem by dividing
the questions in two parts and adding an explanatory text before the questions.
6
Acknowledgements
We thank emeritus professor Veijo Meisalo from the University of Helsinki for suggesting use
of the Delphi method and PhD Sari Kujala from the Helsinki University of Technology for
commenting manuscript of this article.
References
Brooks, R., 1983. Towards a theory of the comprehension of computer programs. International Journal of
Man-Machine Studies 18, 543–554.
Capretz, L., 2003. Personality types in software engineering. International Journal of Human-Computer Studies
58 (2), 207–214.
Conover, W., 1999. Practical nonparametric statistics. 3rd ed. John Wiley and Sons, New York.
Detienne, F., 2002. Software design—Cognitive aspects. Springer, London.
Engel, G., Roberts, E., 2001. Computing Curricula 2001. Computer Science. Final report, December 15, 2001.
Association for Computing Machinery and IEEE Computer Society.
ERIC Thesaurus, 2004. ERIC Thesaurus. Retrieved on April 27, 2004, from the Educator’s Reference Desk
web site: http://www.ericfacility.net/extra/pub/thessearch.cfm .
Greeno, J., Simon, H., 1988. Problem solving and reasoning. In R. C. Atkinson, R. J. Herrstein, G. Lindzey
and R. D. Luce (Eds.): Stevens Handbook of Experimental Psychology, vol. 2 , 589–672.
Kitchenham, B., Pfleeger, S., 2002. Principles of survey research. Part 5: Population and samples. Software
Engineering Notes 27 (5), 17–20.
Stanislaw, H., et al., 1994. A note on the quantification of computer programming skill. International Journal
of Human-Computer Studies 41 (3), 351–362.
Surakka, S., 2004. Supplementary material for article ‘Cognitive skills of experienced software developer: Delphi
study’. http://www.cs.hut.fi/u/ssurakka/papers/Delphi2/index.html .
Wiedenbeck, S., 1985. Novice/expert differences in programming skills. International Journal of Man-Machine
Studies 23 (4), 383–390.
Wilhelm, W., 2001. Alchemy of the Oracle: The Delphi technique. The Delta Pi Epsilon Journal 43 (1), 6–26.