Assessing Students’ Metacognitive Skills
Judy Garrett, PhD,a Martha Alman, MS,a,b Stephanie Gardner, PharmD, EdD,a and Charles Born, PhDa
University of Arkansas for Medical Sciences
University of Arkansas, Little Rock
Submitted March 7, 2006; accepted June 23, 2006; published February 15, 2007.
Objective. To develop a diagnostic test for assessing cognitive skills related to metacognition in
a physiology course.
Methods. Cognitive skills believed to be related to metacognition (visualizing lecture information and
interpreting diagrams) were identified in a first-professional year (P1) physiology course and test items
were constructed for each. Analyses included overall reliability, item discrimination, and variance
comparisons of 4 groups to assess the effect of prior physiology coursework and diagnostic test score
level on the first examination in physiology.
Results. Overall reliability was 0.83 (N 5 78). Eighty percent of the test items discriminated posi-
tively. The average diagnostic test scores of students with or without a prior physiology course did not
differ significantly. Students who scored above the class mean on the diagnostic test and who had taken
a prior physiology course also had the highest average scores on the physiology examination.
Conclusion. The diagnostic test provided a measure of a limited number of skills related to
metacognition, and preliminary data suggest that such skills are especially important in retaining
Keywords: metacognitive skills, assessment, learning, physiology
was the first step in developing a valid and reliable di- Attempts have been made to identify variables that
agnostic tool or test, called a Learning Skills Checkup influence metacognition, beginning with research by
which would serve as an early warning system for iden- Flavell,4 who believed that metacognition was influenced
tifying students with poor metacognitive skills. Students by 3 variables: the learner (self), the task, and the strategy.
who were identified as not having relevant metacognitive Among these variables are learner characteristics such as
skills could then be counseled to attend workshops or in- self-perception, verbal skill and ability, motivation, learn-
dividual sessions to help them develop these skills before ing task variables, and context or situational variables,
the first round of tests. Although opportunities for devel- with task difficulty being especially important.9 Davidson
oping these skills have been available to students at this and Sternberg discuss the importance of identifying rele-
institution for many years, generally such services are vant information and of forming mental maps or repre-
more widely used by students with good metacognitive sentations.10 The relationship between motivation and
skills than by students who are struggling. metacognition has been the subject of numerous studies
The pilot study was designed to address 4 questions: since students are more likely to monitor their understand-
(1) Since all items on each part of the test were intended to ing of information that interests them. One study that
assess only one skill, did all items on each part of the test provides evidence of the relationship between motivation
actually assess the same metacognitive or information and comprehension monitoring is a confirmatory factor
processing skill? (2) Was the test reliable and did items analytic study of the metacognitive and motivational
discriminate between the students who did and did not components of self-regulation.11 Among the efforts
have each skill? (3) Since the metacognitive skills test to identify components of metacognition are those of
was based on physiology content and only some of the Weinert who identifies 2 variables: evaluation, in which
students had taken a physiology course prior to entering a problem is identified, eg, a student realizes that she does
pharmacy school, were test results influenced by prior not understand something; and regulation, in which the
knowledge? (4) Did the diagnostic test do what it was student takes measures to increase understanding, such as
designed to do: provide a measure of the impact of a vari- studying more or using different study strategies.12
able called metacognitive skills on achievement? If so, Metacognitive activities involve mental activities,
higher scores on each part on the test should be reflected which by their nature cannot be observed directly. There-
by higher scores on the criterion measure: the first exam- fore 2 methods of inquiry are often used in metacognitive
ination in physiology. studies. In the first, students evaluate their understanding
Several terms have been used to describe the activi- in terms of feeling-of-knowing (FOK), judgments-of-
ties involved in checking understanding and making learning (JOL), or ease-of-learning (EOL) judgments.13
changes based on the results of self-checks. Although A representative study of this type is that of Tobias and
psychologists and educators have been aware of the im- Everson, in which students’ abilities to correctly estimate
portance of these types of activities for almost a century, what they know and do not know were compared to mea-
according to Brown,3 use of the term metacognition to sures of academic achievement.14 In the second, responses
describe this activity is generally attributed to the pioneer- to self-report instruments, such as the Metacognitive
ing work of Flavell4 who issued a call for studying the Awareness Inventory, are used to assess comprehension
impact of metacognition on learning. Readers who are monitoring skills.15
interested in more information about metacognition Although the impact of comprehension monitoring or
should consult either Wienert and Kluwe5 or Hacker.6 metacognition has been part of the learning research lit-
Terms which have emerged from the field of cognitive erature for almost a century and has been the focus of
psychology are metamemory, metacomprehension, and systematic study since the mid-1970s, a missing element
calibration of comprehension. Maki uses the term meta- in the research literature has been the identification of
comprehension to describe the process of monitoring skills that are actually needed to monitor comprehension.
learning from text.7 According to Maki, much of the This study addresses a call by Flavell to ‘‘try to discover
metacomprehension research has used an error-detection the early competencies that serve as building blocks for
paradigm in which errors are detected in text with missing subsequent acquisitions rather than merely cataloging . . .
or incorrect information. Calibration of comprehension is lacks and inadequacies.’’4
described by Otero as a measure of the relationship be-
tween how well readers think they understand text vs. how METHODS
well they actually can answer questions about it.8 The Due to its information processing demands, one of
term metacognition is used by educational psychologists the most difficult courses for P1 students at this institution
to address the complexity of this type of activity. is physiology, especially for those who have been
accustomed to rote memory learning. A specialist in com- testing with individuals suggested that the test could be
municative disorders assisted in identifying skills thought completed in a 50-minute class period, preliminary test
to be related to metacognition in the course (Table 1). administration activities such as explaining the purpose of
Although metacognition has both motivational and cogni- the exercise, test instructions, and responding to student
tive components, the focus of this study was on assessing questions, significantly reduced actual time remaining for
cognitive skills involved in monitoring understanding. taking the test. After the first part of the test was com-
Selections from the textbook for the course were used pleted, it became obvious that most students would only
in constructing test items.1 Information about the skills have time to complete the first 2 parts of the test: visual-
included on each part of the original test, the number of izing lecture material and interpreting diagrams. There-
items and possible points for each, is summarized and fore, only these 2 skills were included in the pilot
prioritized in Table 1 based on their relevance in the administration.
course, where 2 primary methods of delivering informa- Due to the limited availability of raters, all items were
tion were lectures and diagrams. The number of possible scored by the principal investigator. For the pilot study, an
points exceeds the number of items since with the excep- arbitrary criterion of 70% correct, with more formal
tion of part 5 on vocabulary, items were scored according standard-setting procedures to be established in future
to the number of required elements in a correct answer phases of the study. Each student was provided with an
with 1 point given for each correct element included. For individualized report detailing strengths and weaknesses.
example, the item requiring students to visualize a sarco- Students who scored lower than 70% on either part of the
mere was scored on 10 elements for a total of 10 points test were advised that although many things could influ-
(Appendix 1). ence scores on this exercise, they should try to improve
To introduce students to the types of items on the test, their study skills in these areas.
less complex structures that were easier to visualize and An answer was scored as ‘‘1’’ in a spreadsheet if it
that were likely to be more familiar to students (eg, a neu- included expected information and ‘‘0’’ if information
ron) were presented first, with subsequent items increas- was omitted or incorrect. A similar scheme was used
ing in difficulty. Included in Appendix 1 are instructions to score other responses and an overall score (total
for part 1 of the test, ‘‘Visualizing Spoken Information,’’ possible 5 37) was computed for each student. Scores
an actual item from this part of the test, and the elements on the first physiology examination were also recorded
expected in a correct answer. and the data were uploaded to SPSS, version 12.
The study was reviewed and approved by the institu- Since items on each part of the test were designed to
tional IRB. The test was administered during the first assess only 1 type of skill, eg, visualizing or interpret-
week of fall semester. Although the results of prototype ing diagrams, each reading selection or diagram used in
constructing test questions had to meet 2 criteria: (1) pres- the average physiology examination 1 scores of students
ent a unidimensional test of the skill being tested, ie, with and without prior physiology coursework and who
answering a question based on a diagram or text should scored above and below the class mean on parts 1 and
involve the use of only 1 type of metacognitive skill; and 2 of the test.
(2) be similar in complexity to other items on that part of
the test. Although the number of participants was small (N RESULTS
5 78), exploratory factor analysis (SPSS) was used to Due to the small number of participants, the stability
examine the dimensionality of items on each part of the of factor analysis results is questionable (Question 1).
test. However, preliminary results suggest that the information
SPSS was also used to evaluate test reliability processing demands of some reading selections were in-
(Cronbach’s Alpha) and provide item-total correlations. deed multidimensional, ie, answering items based on
SPSS was also used to identify cut points for the top and these selections involved more than one skill. For exam-
bottom 25% of scores on the test (,22.75 and .30.24). ple, the 10 required elements in the reading selection
Crosstabs was used to compare the number of desirable pertaining to the sarcomere (Appendix 1), were all as-
response patterns (answered correctly by students in the sumed to be related to the same skill: visualizing spoken
top 25% of the class but missed by those in the bottom information. However, the clustering of factor loadings
25%) to the number of undesirable response patterns for elements 1-6 and elements 7-10 suggested 2 dif-
(missed by students in the top 25% and answered ferent skills. A close inspection of this item confirmed the
correctly by those in the bottom 25%). A discrimination results of factor analyses: Elements 1-6 pertain to visual-
index was then computed for each item by subtracting the izing static structures while elements 7-10 pertain to
number of undesirable from the number of desirable visualizing a process: that of muscle relaxation and
response patterns. Comparison groups used in computing contraction.
discriminating indices are shown in Figure 1, with desir- The overall reliability (Cronbach’s alpha) of the
able and undesirable response patterns are illustrated by Learning Skills test was 0.83 (Question 2). Since each
lines A and B respectively. For the purpose of evaluating element required in a correct answer was scored as either
item discrimination indices relative to total test scores, an correct (included) or incorrect (not included), the SPSS
item was considered to be discriminating positively if the reliability program produced item-total correlations
number of desirable response patterns was at least twice for each of these elements. The item with the lowest
as large as the number of undesirable responses. item-total correlation required students to visualize the
SPSS was also used to identify cut points for the top parts of a neuron. This item did not discriminate among
and bottom 25% of scores on the first physiology exam- students, since most answered the question, thereby
ination (,65.63 and .81.99) and crosstabs analyses were almost eliminating variability. Items with the 2 highest
also used to compare the number of desirable/undesirable average item-total correlations were those pertaining to
responses to each item by students who scored in the top visualizing a sarcomere (r 5 0.42) and ion channels (r 5
25% on this examination. 0.39), both of which involved visualizing processes.
Since test items were based on information which Of the 37 scored elements on the test, 29 met the
should have already been familiar to students, another criterion for discriminating positively between students
concern was the extent to which test scores might be in the top and bottom 25%, while 8 did not. Four of the
confounded by prior knowledge. The Kolmogorov- 8 items with low discrimination indices pertained to the
Smirnov test for normality was used to evaluate the score neuron, which most students in all score groups answered
distribution of each group. In order to further address correctly. The other 4 elements that did not discriminate
the question of whether the test did what it was designed positively pertained to interpreting diagrams.
to do, ie, assess the effect of a variable called metacogni- When achievement on Examination 1 was used as the
tive skills on achievement, ANOVA was used to compare criterion measure for determining whether test items dis-
criminated appropriately, items fared less well. When the
criterion for an appropriately discriminating item was that
the number of desired responses should be 1.5 times as
large as the number of undesired responses, only 50% of
the scored elements met that criterion.
When the score distributions of two groups of
students, those with and without prior physiology course-
Figure 1. Computation of item discrimination index. work, were compared, the Kolmogorov-Smirnov statistic
for the group without prior coursework was 0.102 (p 5 Table 3. Comparison of Examination Scores in a P1
0.20); while this statistic for the ‘‘prior coursework’’ Physiology Course According to Completion of a Physiology
group was 0.123 (p 5 0.07). These results suggest that Course Prior to Entering Pharmacy School and High and Low
neither distribution differed significantly from normal. Scores on the Diagram Interpretation Portion of a Cognitive
The average diagnostic test score of students with prior Skills Test
physiology coursework (N 5 47) was 26.2 6 6.2, while Prior Average
the average test score of students without a prior physiol- Course Visualizing Score (SD) N
ogy course (N 5 31) was 26.8 6 5.8 (Question 3). No Group 4: ,7 71.3 (12.4) 14
The average physiology examination 1 scores of stu- Group 4: 7 and above 70.5 (16.6) 17
dents with and without a prior physiology course and who Yes Group 4: ,7 71.4 (10.6) 20
scored above and below the class mean on each part of the Group 4: 7 and above 76.3 (10.6) 27
test are presented in Tables 2 and 3 (Question 4). As
shown in Table 2, when criterion scores were analyzed
relative to low/high scores on part 1 of the test, the average Although data suggested that test scores were not un-
scores of the 3 groups were similar: groups 1 and 2 5 70.8, duly influenced by prior knowledge, one unexpected find-
group 3 5 72.2, group 4 5 76.6. ing of the study was the seemingly differential effect of
As shown in Table 3, the average physiology exam- metacognitive skills on students who had and had not had
ination 1 scores of groups 1-3 were also very similar: 71.3, a prior physiology course. Data in Tables 2 and 3 suggest
70.5, and 71.4, respectively. However as in part 1, only the that the impact of metacognitive skills on examination
average score (76.3) of students in group 4, (those who 1 scores was greatest for students who had taken a pre-
had taken a prior physiology course and scored above the professional physiology course. One interpretation of
mean on this part of the test), was considerably higher these results is that metacognitive skills may have more
than those of students in the other 3 groups. of an impact on retention than on initial learning.
A major outcome of this pilot study was insight
DISCUSSION gained into developing and administering this type of
Metacognition has 3 components: skills used in mon- diagnostic tool. One of the most important considerations
itoring, actual monitoring activities, and making changes in instrument construction is the information processing
based on the results of monitoring. The focus of most requirements of selections used in constructing test ques-
studies pertaining to metacognition has been on methods tions: questions pertaining to processes will result in the
of assessing the impact of monitoring activities and not on most useful information. Both from the standpoint of ease
the actual cognitive skills involved in monitoring. One of scoring and test reliability, when developing questions
contribution of this study is its focus on this much- that test students’ ability to interpret diagrams, objec-
neglected area. Although limited in scope due to test tively scored questions are preferable to open-ended
administration time, preliminary data from this pilot items. Instead, objective questions should be constructed
study suggest that the test did assess skills other than prior over parts of a process. For example, when revising
knowledge. Data resulting from this study also provided the item pertaining to protein synthesis, the open-ended
some evidence of the impact of 2 types of metacognitive statement/item ‘‘Identify the main points of the diagram,’’
skills on achievement: visualizing spoken information will be replaced by questions such as ‘‘Where does the
and interpreting diagrams. process start?,’’ and ‘‘How many paths can a protein take
before it leaves a cell?’’ By numbering or lettering each
‘‘protein synthesis’’ path, questions can be constructed
Table 2. Comparison of Examination Scores in a P1 about differences in the paths, eg, 2 pertain to proteins
Physiology Course According to Completion of a Physiology that leave the cell, while 1 does not. Other considerations
Course Prior to Entering Pharmacy School and High and Low are beta-testing and test-administration time. Although
Scores on the Visualizing Portion of a Cognitive Skills Test beta testing with 2-3 individual students, as was done in
Prior Average this study, provides useful information about revision
Course Visualizing Score (SD) N needs, beta testing with as large a group as possible pro-
No Group 1: ,19 70.8 (11.8) 12 vides even more useful information. Ideally diagnostic
Group 2: 20 and above 70.8 (16.5) 19 testing should be done before classes begin and time
should not be limited to one class period. In subsequent
Yes Group 3: ,19 72.2 (11.1) 25
administrations of similar diagnostic tests in other pro-
Group 4: 20 and above 76.6 (10.2) 22
fessional programs at this institution (nursing and medicine),
about 90 minutes has been required for most students to the extent to which some students took advantage of this
complete a test similar to the one outlined in the Test Plan feedback to improve their skills or the extent to which any
Table in Table 1. subsequent improvement may have influenced scores in
A practical limitation of this type of assessment is the the criterion measure is not known.
time required for scoring each test. In order to be useful, Finally, some would argue that this type of assess-
students must receive feedback very quickly. In this pilot ment is not needed. They assume that students who do not
study, results were available 2 days after the test was admin- have appropriate metacognitive skills when they enter
istered. Reliably scoring 78 tests within these time constraints pharmacy school will somehow eventually develop them.
was a Herculean task and may have introduced some scorer Analysis of end-of-course achievement data of this group
bias. Although measures were used to reduce this type of strongly suggests otherwise. Of those students who scored
bias, such as all tests being scored by 1 person using stringent in the bottom 25% on the first physiology examination,
criteria for judging answers, the scoring of constructed- 50% were still in the bottom 25% at the end of the course.
response items remains a potential source of error variance. These data suggest that unless metacognitive skills deficits
Our plans are to revise the diagnostic test. Questions are identified and remedied, students who do not already
with item-total correlation coefficients below 0.35 were have these skills will not develop them on their own.
targeted for revision and/or replacement in future ver- Assistance in developing learning skills has been
sions. Items that appear to be influenced by prior knowl- available for the students at this institution for several
edge will be replaced in future revisions. Although pilot years. However, until this study, there was little informa-
testing of this diagnostic instrument did not suggest any tion about either the specific types of skills needed in this
confounding by prior knowledge, using content that P1 course or the effect of not having these skills. These
should already be somewhat familiar to students does data will permit us to be more proactive in identifying
have the potential for this type of confounding so addi- students who do not have these skills and in providing
tional study of possible confounding effects of prior structured programs, thereby remedying learning skills
knowledge is needed. In this pilot study, the decision deficits before they negatively influence achievement
was made to err on the side of background information throughout pharmacy school.
enhancing learning skills instead of obscuring them.
Due to the limited number of skills included on the CONCLUSION
diagnostic test in this pilot study, another focus of future This study provided a shift in the focus of metacog-
studies should be the identification of other cognitive nitive research from methods of assessing the impact of
skills related to metacognition, eg, skills such as distin- monitoring to one of identifying skills actually used in
guishing between relevant and irrelevant information or monitoring. Preliminary data from this pilot study suggest
knowledge of learning tasks, ie, what one must know or be that 2 cognitive skills related to metacognition, visualiz-
able to do to demonstrate that he or she knows something. ing lecture information and interpreting diagrams, are
Another needed focus is whether such skills are course independent of content, and that these skills may have
specific. Preliminary findings of other studies being more impact on retention than on initial learning. More
conducted at this institution suggest that skills used in importantly, it resulted in some guidelines for developing
metacognition vary considerably from course to course. more valid measures of the skills related to metacogni-
Although visualizing structures or processes may be very tion. If administered during the first few days of the
important in a physiology course, it may be less important semester, measures such as these will be useful in identi-
in therapeutics, where more relevant metacognitive skills fying students who may benefit from structured interven-
are condensing and organizing to identify similarities and tions to improve their study skills. The value of this type
differences in treatment regimens for closely related of assessment is expected to increase through continued
disorders. revisions of the test. As revisions are made in test items
An interesting finding resulting from this pilot study and more time is available for testing, we will be able to
was a seemingly differential effect of both metacognitive- better assess the relative contributions of different types
related skills on initial learning and retention. Although of skills and their impact on academic achievement.
these results may have been influenced somewhat by the
large standard deviation of examination scores, we will REFERENCES
1. Ganong WF. Review of Medical Physiology. New York: McGraw
continue to investigate this possible differential effect.
Hill, 2001.
Although a follow-up session also provided students 2. Houglum JE, Aparasu RR, Delfinis TM. Predictors of academic
with suggestions for improving skills in each area, it did success and failure in a pharmacy professional program. Am J Pharm
not provide opportunities to practice these skills. Moreover Educ. 2005;69:283-9.
3. Brown A. Metacognition, executive control, self regulation, and 9. Flavell JH. Speculations about the nature and development of
other more mysterious mechanisms. In: Weinert FE, Kluwe RH, eds. metacognition. In: Weinert FE, Kluwe RH, eds. Metacognition,
Metacognition, Motivation, and Understanding. Hillsdale, Motivation, and Understanding. Hillsdale, New Jersey: Lawrence
New Jersey: Lawrence Erlbaum Associates, 1987:65-116. Erlbaum Associates, 1987:21-9.
4. Flavell JH. Metacognition and cognitive monitoring: a new area of 10. Davidson JE, Sternberg RJ. Smart problem solving: how
cognitive-developmental inquiry. Am Psychol. 1979;34:906-11. metacognition helps. In: Hacker DJ, Dunloskey J, Graesser AC, eds.
5. Weinert FE, Kluwe RH, eds. Metacognition, Motivation and Metacognition in Educational Theory and Practice. Mahwah, New
Understanding. Hillsdale, New Jersy: Lawrence Erlbaum Associates; Jersey: Lawrence Erlbaum Associates, 1998:47-68.
1987. 11. Hong E, O’Neil Jr HF. Construct validation of a trait self-
6. Hacker DJ. Self-regulated comprehension during normal reading. regulation model. Int J Psych. 2001;36:186-94.
In: Hacker DJ, Dunloskey J, Graesser AC, eds. Metacognition in 12. Weinert FE. Cognitive knowledge and executive control:
Educational Theory and Practice. Mahwah, New Jersey: Lawrence metacognition. In: Griffin DR, ed. Animal Mind-Human Mind. New
Erlbaum Associates, 1998:165-91. York: Springer-Verlag, 1983:201-24.
7. Maki RH. Test predictions over text material. In: Hacker DJ, 13. Hacker DJ. Definitions and empirical foundations. In: Hacker DJ,
Dunloskey J, Graesser AC, eds. Metacognition in Educational Theory Dunloskey J, Graesser AC, eds. Metacognition in Educational Theory
and Practice. Mahwah, New Jersey: Lawrence Erlbaum Associates, and Practice. Mahwah, New Jersey: Lawrence Erlbaum Associates,
1998:117-44. 1998:1-23.
8. Otero J. Influence of knowledge activation and context on 14. Tobias S, Everson HT. Studying the relationship between
comprehension monitoring of science texts. In: Hacker DJ, affective and metacognitive variables. Anxiety Stress Coping.
Dunloskey J, and Graesser, Arthur C, eds. Metacognition in 1997;10:59.
Educational Theory and Practice. Mahwah, New Jersey: Lawrence 15. Schraw G, Dennison RS. Assessing metacognitive
Erlbaum Associates, 1998:145-64. awareness. Contemp Educ Psychol. 1994;19:460-75.
Appendix 1. Instructions for part 1 of a diagnostic test for assessing cognitive skills, scoring key, and anticipated
answer. Numbers in selection/scoring key denote required elements.
Instructions: In this exercise, you will listen to descriptions of biologic structures. Each selection will be read twice, with a minute
between each reading. The first time you hear the selection, try to picture the structure being described. After the selection is read the
second time, in the box after each item number, draw a picture of the structure you heard described. If you know the name of the
structure, write it in the blank below the box.