A Physics Diagnostic Test
A Physics Diagnostic Test
A Physics Diagnostic Test
DIAGNOSTIC TEST
Leo Sutrisno
Dept. Math and Science
Education
Faculty of Education
Tanjungpura University
Pontianak, Indonesia
List of contents
31
List of Figures
Figure 1: A 2x2 table of response pattern of item A and
item B on dichotomous events
List of Tables
Table 1 Characteristics of items of the physics
diagnostic test
32
The test is designed primarily to detect the common student
errors in learning about sound. In other words, the test is used as
a diagnostic test. (Gronlund, 1981). The results of the test will
then be used to design learning experiences to remedy these
errors. The test is used again to detect whether or not these
errors have been overcome through remedial activities. In this
study, the expected ("correct") model of performance is scientists'
conceptions and the items of the test are based on the students'
pre-conceptions.
33
domain of this Taxonomy will be adopted as the basis for a list of
behaviourial objectives.
The course content outline is based on the Indonesian
Curriculum-1984 for the General Senior High Schools (SMA). The
unit of instruction about sound consists of several sub units such
as the sources of sound, the transmission of sound, the medium
of transmission, velocity of sound in several media, musical
instruments, human ears and the Doppler Effect.
Gronlund (1981, p.126) suggests that a diagnostic test
should have a relatively low level of difficulty and that most
items should be focused on the knowledge level (51%) with few
items at the level of synthesis or evaluation. At the knowledge
level, items are grouped into knowledge of specific facts,
knowledge of conventions and knowledge of principles.
Knowledge of specific facts refers to "those facts which can only
be known in a larger context" (Bloom, 1956 p.65). Knowledge of
conventions refers to "particular abstractions which summarize
observations of phenomena" (p.75). The items are intended to
recall and recognize facts, conventions or principles in learning
sound.
At the comprehension level items deal with the
interpretation of the given illustration, while at the application
level, items deal with selecting an appropriate principle(s) to
solve a given problem. Items at the analysis level focus on
analysisng the element of given problem.
34
As a broad classification, tests can be grouped as either
essay or objective types. Karmel (1970) makes a comparison
between essay and objective tests on the abilities measured,
scope, incentive to pupils, preparation and method of scoring.
The essay test uses few questions, requires the students to use
their own words to express their knowledge, generally covers
only a limited field of knowledge, encourages pupils to learn how
to express and organize their knowledge, and is very time
consuming to score.
On the other hand, the objective test requires the students
to answer, at most, in a few words, can cover a broad range of
student preparation, is time consuming to construct but can be
scored quickly. "Objective" tests are objective in the sense that
once an intended correct response has been decided on, the test
can be scored objectively, by a clerk or mechanically. On the other
hand, "essay" tests require an "expert" to decide on the worth of
the response and so involve an element of subjectivity in scoring.
See also Gronlund, 1981.
Hopkins and Stanley (1981) present a list of limitations of
essay tests based on research findings: reader unreliability, halo
effects, item-to-item carryover effects, test-to-test carryover
effects, order effects, and language mechanics effects. Theobald
presents similar criticisms: evaluation is difficult, scoring is
generally unreliable, and costly in time and effort, and the
sampling of student behaviours is usually inadequate. (Theobald,
1974). However, Blum and Azencot (1986) who compared the
results of students on multiple-choice test items and equivalent
essay questions in an Israeli examination concluded that there
were no significant differences between the mean scores.
35
Because of the need to comprehensively sample the field of
learning in a diagnostic test (Hopkins and Stanley, 1981),
objective tests are to be preferred over essay test for this
purpose.
36
patterns on N items with A options in each item. The function A" is
maximized by A = e if N.A is fixed. Since e is 2.718 then A is
approximated by 3. Costin (1970) compared the discrimination
indices, the difficulty indices and the reliabilities (KR-20) of test
results which were based on 3-option and 4-option multiple-choice
tests on perception (N = 25), motivation (N = 30), learning (N =
30) and intelligence (N = 25). He found that the 3-option forms
produced higher values than the 4-option forms.
The second approach was proposed by Grier (1975) and is
based on test reliability. The expected reliability of a multiple-
choice test is a function of the number of options per item. Grier
concluded that for C > 54 the three options per item maximizes
the expected reliability coefficient. (for large C the optimal value
of N approaches 2.50).
The third approach examines the knowledge-or-random-
guessing assumption. Lord (1977) used this approach and found it
to support Grier's result (= 3). The fourth approach is based on
the use of the item characteristic curve (Lord, 1977). The item
characteristic curve of an item gives the probability of a correct
answer to the item as a function of examinee ability. Lord found
that the test where the pseudo chance-score level of items is 0.33
is superior to others.
Green, Sax and Michael (1982) analyse the reliability and the
validity of tests by the numbers of options per item. They found
that 4-options per item produces the highest reliability when
compared with 3-options and 5-options, but that 3-options has the
highest predicted validity (correlations with the course grade).
The experimental evidence quoted above argues in favour of
using three options per item and this form will be adopted in this
37
study. A multiple choice item has two parts: the introductory
statement to pose the problem and a series of possible responses.
The first part is called "a stem" and the second part is called
"options", "alternatives", or "choices". In this study the term
"options" is being used. Options include the correct response and
distracters, foils or decoys for the others.
Using the table of specifications items were constructed for
each sub unit of instruction. There are standard guidelines for
constructing multiple choice items (Hopkins and Stanley, 1981;
Noll et al., 1979; Gronlund, 1981; Theobald, 1974). In relation to
the item, there are many suggestions.
Summing up, items should be as clear and short as possible
to pose the problem and they should not contain cues which
attract students' attention to the correct response.
Several studies have been conducted on the effects of
violating item construction principles. McMorris, Brown, Snyder
and Pruzek (1972) studied the relationship between providing
signs about the correct response and the test results. Three types
of signs used in their study are: words or phrases in the stem
which provide a sign to the correct answers, grammar - where the
correct answer was the only one grammatically consistent with
the stem, and length where the correct answer was longer than
the distracters. They found that these factors are positively
correlated to the test results. (.48; .39; and .47 for words or
phrases, grammar, and length respectively).
Austin and lee (1982) studied the relationship between the
readability of the test items and item difficulty. Several aspects of
the readability were the number of sentences in each item, the
number of words in each item, and the number of "tokens"
38
(words,digits or mathematical symbols) in each item. They found
that these aspects of the readability of the items were negatively
correlated with item difficulty. (-.22, -.24, -.15 for the number of
sentences, words and tokens respectively). Increasing the number
of sentences, words or tokens in the item would decrease the
number of the correct answers.
Schrock and Mueller (1982) studied the effect of the stem
form (incomplete and complete statement), and the presence or
absence of extraneous material as cues to attract students to the
correct answers. They suggest that "a complete sentence stem is
to be used rather than an incomplete sentence stem" and that
"extraneous material should not be present in any type of stem"
(p.317).
Green (1984) studied the effects of the difficulty of language
and the similarity among the options on item difficulty. The
difficulty of language was varied by the length of stems, the
syntactic complexity, and the substitution of an uncommon term
with a familiar term in the stem. She found that the similarity
among the options significantly affect the item difficulty (F =
72.21; a < 1%) but it did not affect difficulty of language.
There are standard guidelines for constructing multiple
choice items (Hopkins and Stanley, 1981; Noll et al., 1979;
Gronlund, 1981; Theobald, 1974).
In relation to the item, there are many suggestions:
1. Items should cover an important achievement.
2. Each item should be as short as possible.
3. The reading and linguistic difficulty of items should be
low.
39
4. Items that reveal the answer to another should be
avoided.
5. Items which use the textbook wording style should be
avoided.
6. Items which use specific determiners (e.g., always,
never) should be avoided.
7. Each item should have only one correct answer.
8. In a power test, there should not be too many items
otherwise the text would become a speed test.
40
Summing up, items should be as clear and short as possible
to pose the problem and they should not contain cues which
attract students' attention to the correct response.
Test items used in this study were written to follow the
recommendations quoted above as closely as possible. Ninety-two
items were constructed and divided into two tests for ease of
administration: form-A (45 items) and form-B (47 items). The first
drafts were written in English and the final drafts for trying out
were in Bahasa Indonesia
41
2.1 Item analysis
There are many standard item analysis techniques
(Hopkins and Stanley, 1981; Anastasi, 1976; Hills, 1981;
Gronlund, 1981; and Theobald, 1974). The most common indices
used are the facility value and the discriminating power of the
item. The means and standard deviations of the total correct
answers for each form of the tests are 54.91 and 22.92 (form A);
53.26 and 26.04 (form B).
Facility values
Facility value is defined as "the percentage of students
completing an item correctly" (Theobald, 1974, p.33). Some
authors use the reverse concept, that of "item difficulty" (Noll et
al., 1979, p.83; Anastasi, 1976, p.199; Gronlund, 1981, p.258). In
this study the term "facility value" is adopted as its use will lead
to less confusion. The greater the facility value of the item the
more students are able to answer correctly.
The means and the standard deviations of the facility values
of items of the test are 55 percent (SD = 10) for the form-A and
48.6 percent (SD = 21.5) for the form-B. The minimum values are
24 percent (item no 24 of form-A) and 19 percent (item no 25 of
the form-B) while the maximum values are 90 percent (item no 7
of the form-A) and 100 percent (item no 5 of the form-B).
The facility values of items will be considered as an aspect
which should be taken in account when selecting items to produce
the final test. Theobald (1974) suggests that "items should lie
within the range of 20 per cent - 80 per cent difficulty" (p.34).
This suggestion will be implemented but with consideration given
to Anastasi's warning that "the decisions about item difficulty
42
cannot be made routinely, without knowing how the test scores
will be used" (Anastasi, p.201).
As mentioned earlier the test is designed to be used as a
diagnostic test in an attempt to measure in detail academic
strength and weaknesses in a specific area, in contrast to the
survey test which is an attempt to measure overall progress
(Karmel, 1970, p.283). If an item does not appear to suffer from
technical faults in construction, a low facility value could reflect
that students' pre-conceptions have not been replaced by
scientists' conceptions in most students. Such an item indicates a
common students' weakness and would tend to be retained.
43
item score and the criterion score. Usually the total score on the
test itself is used as the criterion score.
The biserial correlation method is based on the assumption
that "the knowledge of the test item and the knowledge of the
entire test are both distributed normally" (Mosier and McQuitty,
1940, p.57).
Means and standard deviations of the bi-serial correlation
coefficients of the test form-A and the test form-B. (.20 and 0.12
Sd for the test form-A, .15 and 0.13 Sd for the test form-B). Item
no 1 of form-A had the lowest correlation. This item dealt with
generation of sound. Item no 6 of form-B which attempted to
investigate students' understanding about waves by using a
diagram of a wave, had the lowest correlation among items of
formB. Item number 43 of the form-A and number 17 of the
form-B have the highest correlation coefficients (.46, and .38
respectively). Item 43 dealt with the Doppler Effect, while item
17 dealt with the transmission of sound in a metal bar.
The use of the phi coefficient was developed by Guilford
(1941) based upon "the principle of the correlation between an
item and some criterion variable" (p.ll). This method is
applicable if these two groups are equal in number.
Several computational aids have been developed for
arriving at phi coefficient values: tables (Jurgensen, 1947, for
equal groups; Edgerton, 1960, for unequal groups), nomograph
(Lord, 1944), and abacs (Mosier and McQuitty, 1940; Guilford,
1941).
In this study the phi coefficients of items are calculated
following Theobald's procedures for using the abac method.
Table 4.4.7 presents means and standard deviations of phi
44
coefficients of the tests form-A and form-B. There is no
significant difference between the phi coefficients of the two
groups. (t = 0.02, p< .O1).
Findley (1956, p.177) also proposed two formulas to
measure the discrimination power of items. One of the potential
virtues of this method is that it can be used to provide "a precise
measure of the distractive power of each option" (Findley, p. 179).
In this regard, Findley's method can provide more information
than the bi-serial correlation coefficient and than the phi
coefficient measured by the abac method. So the index of
discrimination power of items measured using Findley's method
will be used to select items.
Means and standard deviations of these indices are .22 and
0.17 for the form-A, and .21 and 0.18 for the form-B). There is no
significant difference between the indices of the two groups (t =
0.02, a _ .O1). The Findley index and phi coefficient are highly
correlated (.96 for the form-A, and .97 for the form-B).
45
The official class-period time in Indonesian secondary
schools is 45 minutes but the effective period would be about 40
minutes. Students have wide experience of multiple-choice
testing in Indonesian secondary schools and experience suggests
that for a test to be completed by nearly all students the
appropriate number of items would be about 30. A final form of a
physics diagnostic test was constructed from the 97 items which
were trialled. The 30 items will be selected mainly from the test
form-A. Whenever needed items of the form-B were included.
In selecting items for inclusion in a test, Hopkins and
Stanley (1981, p.284) believe that an item which has high
difficulty and low discrimination power may, on occasions, be
acceptable. On the other hand, Gronlund (1981) says that "a low
D should alert us to the possible presence of technical defect in a
test item" (p.262). Noll et al. (1979) stated that there is little
reason to retain items which have negative discrimination indices
unless the other important values can be shown. Theobald (1974)
suggested that "items should lie within the range of 20 per cent -
80 per cent difficulty" (p.34) and "items should be carefully
scrutinized whenever D < +.2" (p.32). Anastasi claims that items
which have around 50 per cent of difficulty and of discrimination
power are preferable.
Although "item analysis is no substitute for meticulous
care in
planning, constructing, criticizing, and editing items" (Hopkins and
Stanley 1981, p.270) Theobald's quantitative guidelines will be
adopted. He states, however, that even when test items are
chosen to reflect precisely stated behavioural objectives, and the
test as a whole is an adequate and representative sample of the
46
criterion behaviours for the course of study, additional
considerations also apply (P32).
There are three additional considerations: the students'
and the teachers' comments about the items, prerequisite
relationships among sub units of study and caution indices.
Students' and teachers' comments on the items are available.
These were also considered carefully when selecting or rejecting
items.
There are many methods used to test models of
prerequisite relationships among sub units. The first method is the
Proportion Positive Transfer method which was pioneered by
Gagne and Paradise (Barton, 1979), and its refinement suggested
by Walbesser and Eisenberg (in White, 1974b). White observed
that these methods do not take into account errors of
measurement, so he and Clark proposed another method (White
1974a, 1974b). By applying the Guttman's coefficient (Yeany,
Kuch and Padilla, 1986), and the phi coefficient (Barton, 1979;
White, 1974a) the scalogram method has also been widely used.
Barton also proposed a method called the Maximum Likelihood
method. Proctor (1970) suggested the use of X.2 procedures
(Bart and Krus, 1973 p.293).
Dayton and Macready (1976) used a probabilistic method
(Yeany et al., 1986). An Ordering Theory method has been used
by Bart and Krus (1973); Airasian and Bart (1973); Krus and Bart
(1974). Bart and Read (1984) tried to adopt Fisher's exact
probability method.
Although all these methods have their own particular
advantages and limitations most of them share a similar problem
of determining whether a certain model of prerequisite
47
relationship occurred by chance or not. Bart and Read's method
suggests procedures to solve this problem, so their method has
been adopted in this study. This method is based on the axiom:
For dichotomous items, with a correct response scored "1", and
incorrect response scored "0", success on item i is considered a
prerequisite to success on item j, if and only if the response
pattern (O1) for items i and j respectively does not occur. Bart
and Read, 1984 (p.223)
Given items A and B which have been administered to N
students we produce a 2 x 2 table of response pattern as follows:
Item Success
N 11 M
12 N 1.
A (1)
Fa i l ( 0 ) N Z1 NZZ NZ.
N,1 N,Z N
48
need to be given remedial activities. The method adopted in this
study is based on Student-Problem (S-P) curve theory. Harnisch
(1983) states that this method also provides information about
each respective item by observing the distribution of distracters
above the P-curve.
An unusual item is one that has a large number of better
than average students answering it incorrectly while an equal
number of less than average students answer it correctly,
Harnisch, 1983 (p.199)
Harnisch and Linn (1981), and Harnisch (1983) proposed the
Modified Caution Index (MCI) Formula. This formula is originally
used to detect the characteristics of students based on their
responses. Harnisch (1983) stated that the MCI can be used to
detect the characteristics of items as well by reversing the roles
of students and items. "High MCI's for items indicate an unusual
set of responses by students of varying ability, and thus these
items should be examined closely by the test constructors and
the classroom teacher" (p.199). This criterion will be adopted as
one of the additional considerations in selecting items.
Thus the criteria used for the selection of appropriate test
items to be used in this study can be restated as:
1. The facility value of the item lies between 20%-80%.
2. The discrimination index (D) is equal to or more
than .20.
If these requirements are not met, additional information
about the items is needed such as:
3. The students' and teachers' comments about the item.
4. The prerequisite relationships of the item for other
items.
49
5. The Modified Caution Index (MCI) of the item.
6 The final form of the test
There were 22 items of the form-A which meet the first two
criteria. Several items which did not meet these criteria were
considered for inclusion in the final form of the test after
additional consideration. For example item no 8, facility value =
64, D = .15, received several comments that indicate confusion
between a vacuum pump (pompa hampa udara) and a
vacuum (ruang hampa u ara). It is suggested that the
description of a vacuum pump within the stem be rephrased. In
addition, the MCI of this item is low (.05). Item no.15 will also be
included in the final test because the MCI of the item is low (.21)
Other items were not acceptable for inclusion although some
of them could be acceptable after slight revision. For example,
item no21 which has 50 percent facility value and .18 index of
discrimination received some comments revealing that many
students have not heard the term bulk modulus. Providing an
explanation about the meaning of the bulk modulus may be
expected to increase its facility and may alter the index of
discrimination. However, this item is excluded because the MCI of
this item is high (.77).
Six items from form-B were chosen to replace the position of
items of form-A on the bases of prerequisite relationships among
sub units. Item no.l (form-8) replaces item no.l (form-A), item
no.12 (B) replaces item no.19 (A), items no.13(B) and 14(B)
replace items no.13(A) and 14(A), respectively, item no.17(B)
replaces item no.16(A), item no.21(B) replaces item no.21(A), and
items no.42(B) and 44(B) replace items no.40(A) and 45(A)
respectively.
50
Table 4. 1 presents the number of items, their numbering in
the original forms and facility values, discrimination indices, and
the MCIs of each item calculated from the initial trialling, and from
the second investigation. The facility values of the 32 items in the
final version of the test fell between 20 and 80 percent: the
increase in facility values was to be expected over those in form-A
and form-B because these students had received instruction in
physics of sound. Similarly all but 2 items had discrimination
indices above .20.
There was strong pressure from physics teachers to include,
in the final form of the test, items which would test students'
conceptions of the transmission of sound at night and the
influence of the force of gravity on sound. There were no such
items in form-A or form-B and two new items were constructed
with great care and included in the final form. If these items
performed poorly in the second investigation (where the test was
administered to 596 students from.l9 schools) they could have
been dropped for the experimental stage of the study. As it was
they performed as well as many other items and contributed to a
total test reliability of .85 (SpearmanBrown) (Standard error of
measurement = 1.77).
51
Item Original Fac.
D MCI Sub units
no. item no. value
a ba b ab
52
List of contents
List of Figures
Figure 1: A 2x2 table of response pattern of item A and
item B on dichotomous events
List of Tables
Table 1 Characteristics of items of the physics
diagnostic test
Important terms
3-option
4-option
5-option
abacs
53
analysis level
application level
behaviourial objectives
biserial correlation,
Bloom's Taxonomy
common student errors
comprehension level
content validity
diagnostic test
diagnostic test
difficulty
difficulty indices
difficulty of language
discrimination function
discrimination indices,
discrimination of the items
Educational Objectives
effects of violating item construction principles
essay and objective tests
Facility values
framework for educational measurement
Guttman's coefficient
halo effects,
index of discrimination.
instructional objectives
item-to-item carryover effects,
knowledge level
knowledge of conventions
Knowledge of conventions
54
knowledge of principles.
knowledge of specific facts
knowledge-or-random-guessing assumption
language mechanics effects
length of stems,
level of difficulty
level of synthesis
Modified Caution Index (MCI) Formula
optimal number of options per item.
order effects,
Ordering Theory
own conceptions
phi coefficient
phi coefficient,
prerequisite relationships
probabilistic method
readability of the test items and item difficulty
reader unreliability,
recall and recognize facts, conventions or principles
reliability
scientists' conceptions
standard guidelines for constructing multiple choice items
stem form
student errors
Student-Problem (S-P) curve theory
students' observable outcomes
students' pre-conceptions
syntactic complexity
table of specifications
55
table of specifications
test-to-test carryover effects,
The discrimination power of the test
the reliabilities (KR-20)
the substitution of an uncommon term
true-false items
validity of tests
List of refernces
Anastasi, (1976). Psycho log i ca l tes t (4
i ngth ed . ) . New Yo rk :
Macmi l l an .
Bart, W.M., & Krus, D.J., (1973). An ordering-theoretic method to
determine hierarchies among items. Educa t i ona l and
Psycho log i ca l Measu rement, 33, 291-300.
Bart, W.M. & Read, S.A., (1984). A statistical test for prerequisite
relations. Educational and Psychological Measurement, 44,
223227.
Barton, A.R., (1979). A new statistical procedure for the analysis
of hierarchy validation data. Research in Science Education,
9, 23-31
Bloom, B.S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl,
D. R., (1956). Taxonomy of educational objectives:
handbook 1: cognitive domain. London: Longman.
Blum, A., (1979). The remedial effect of a biological game. Journal
of Research in Science Teaching, 16(4), 333-338.
Blum, A., & Azencot, M., (1986). Multiple choice versus equivalent
essay questions in a national examination. European Journal
of Science Education, 8(2), 225-228.
Costin, F., (1970). The optimal number of alternatives in multiple-
choice achievement tests: Some empirical evidence for a
mathematical proof. Educational and Psychological
Measurement, 30, 353-358.
56
Fisher, K.M., & Lipson, J.I., (1986). Twenty questions about student
errors. J ou rna l o f Resea rch i n Sc ience Teach 23(9
ing) , 783
-
803 .
Gagne, R.M., & Paradise, N.E., (1961). Abilities and learning sets in
knowledge acquisition. Psycho log i ca l Monographs75 , (14 ,
who le , no . 518) .
Green, K. (1984). Effects of item characteristics on multiple choice
item difficulty. Educat i ona l and Psycho log i ca l Measurement ,
44, 551-561.
Green, K., Sax, G., & Michael, W.B., (1982). Validity and reliability
of test having differing numbers of options for students of
differing level of ability. Educa t i ona l and Psycho log i ca l
Measurement , 42, 239-245.
Grier, J.B., (1975). The number of alternatives for optimum test
reliability. J ou rna l o f Educat i ona l Measurement
, 12, 109-113.
57
McMorris, R.F., Brown, J.A., Snyder, G.W., & Pruzek, R.M., (1972).
Effects of violating item construction principles. J ou rna l o f
Educa t i ona l Measurement , 9(4), 287-295.
Mosier, C.I., & McQuitty, J.V., (1940). Methods of item validation
and ABACS for item-test correlation and critical ratio of
upper-lower difference. Psvchometrika, 5(1), 57-85.
Noll, V.H., Scannell, D.P., & Craig, R.C., (1979). Introduction to
educational measurement (4th ed.). Boston: Houghton
Mifflin.
Proctor, C.H., (1970). A probabilistic formulation and statistical
analysis of Guttman scaling. Psychometrika, 35, 73-18.
Theobald, J.H., (1974). Classroom testing. Principles and practice
(2nd ed.). Melbourne: Longman Cheshire.
Theobald, J.H., (1977). Attitudes and achievement in biology.
Unpublished Ph.D. Thesis, Monash University.
Wade, R.K., (1984/85). What makes a difference in inservice
teacher education? A meta-analysis of research. Education
Leadership, 42(4), 48-54.
Walbasser, N.H., & Eisenberg, T.A., (1972). A review of research
on behavioural objectives and learning hierarchies.
Mathematics Education Reports. Columbus, Ohio: ERIC
information analysis centre for science, mathematics and
environmental education. ERIC no.ED059900.
White, F.A., (1975). Our acoustic environment. New York: Wiley .
White, H.E., (1968). Introduction to college physics. New York: Van
Nostrand.
White, M.W., Manning, K.V., & Weber, R.L., (1968). Basic physics.
New York: McGraw-Hill.
White, R.T., (1914a). A model for validation of learning hierarchies.
Journal of Research in Science Teaching, 11(1), 1-3.
White, R.T., (1974b). Indexes used in testing the validity of
learning hierarchies. Journal of Research in Science Teaching,
11(1), 61-66.
Yeany, R.H., Dsst, R.J., & Mathews, R.W., (1980). The effects of
diagnostic-prescriptive instruction and locus of control on
the achievement and attitudes of university students.
Journal of Research in Science Teaching, 17(6), 537-543.
Yeany, R.H., Kuch, Chin Yap, & Padilla, M.J., (1986). Analysing
hierarchical relationships among modes of cognitive
reasoning and integrated science process skills. Journal of
Research in Science Teaching, 23(4), 277-291.
Yeany, R.H., & Miller, P.A., (1980). The effect of
diaqnostic/remediation: instruction on science learning: A
58
meta - ana lys i.sPaper presented at the annual meeting of the
National Association for Research in Science Teaching.
Boston, MA, April 11-13. ERIC no.ED187533.
Yeany, R.H., Waugh, M.L., & Blalock, A.L., (1979). The effects of
achievement diagnosis with feedback on the science
achievement and attitude of university students. J ou rna l o f
Re
59