ATP 2009 Secure Testing AW

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Creating More Secure Exams

through Performance Based Testing

Andrew Wiley
The College Board
Research and Development

February 25, 2009

1
Background
• Choosing students: Higher education admissions tools
for the 21st century (Camara & Kimmel, 2005)

• Purpose:
– Identify additional predictors of college success
– Expand the definition of what constitutes successful
performance in college beyond freshman GPA

• College Board has initiated several projects to address


this research area

2
Background
• Most of these projects involves the development of
measures that are closer to performance based
assessments than are the traditional exams like the SAT.
• The challenge that The College Board must face is
whether these new assessments can be delivered in a
manner that is secure and not easily coached.

3
Research collaboration with
Michigan State University
• Identify a broader domain of college student
performance:
– Review university mission statements and
department objectives
– Interview with university staff responsible
for student life at Michigan State University
– Review of the education literature on student
outcomes
• Our systematic search resulted in 12 dimensions
of student performance…

4
12 Dimensions of Student Performance
Broadening the Performance Domain in the Prediction of Academic
Success (Schmitt, Oswald, & Gillespie, 2004)
1. Knowledge, learning, mastery of general principles
2. Continuous learning, intellectual interest and curiosity
3. Artistic and cultural appreciation
4. Multicultural appreciation
5. Leadership
6. Interpersonal skills
7. Social responsibility, citizenship and involvement
8. Physical and psychological health
9. Career orientation
10. Adaptability and life skills
11. Perseverance
12. Ethics and integrity

5
Two “Noncognitive” Measures
• Situational judgment inventory
– A situation is presented along with several alternative
courses of action.
– The respondent is asked to indicate what she/he
would be most likely and least likely to do.

• Biodata
– Short, multiple choice reports of past
experience/background and interests/preferences.

6
Study 1: Psychometric adequacy
& scale refinement
• 644 MSU freshmen completed one of the two parallel forms of the
biodata and SJI instruments at the beginning of the academic year.
• Identical empirical-keying procedures were conducted on both
instruments at the item level (double-cross validated using randomly
split samples).
• Results indicated significant incremental validity for some of the scales
above and beyond the validity of SAT/ACT scores and existing
measures of personality in predicting college GPA.
• The biodata and SJI demonstrated the greatest incremental validity
when absenteeism, students’ self ratings, and peer-ratings of
performance were examined ( .19, .22, and .14, respectively).

7
Study 1: Standardized Differences
Compared with White group…
Non-cognitive Dimension Black Hispanic Asian

Knowledge -0.08 -0.20 -0.25 • Positive values indicate that


minorities perform better than
Learning 0.01 0 .63* -0.19 White students.

Artistic -0.19 0 .73* 0.15 • The d values for biodata and


SJI measures across ethnic and
Multicultural -0.11 0 .63* 0.02 gender subgroups were
consistently smaller than those
Leadership -0.18 0.08 -0.30 found on cognitive predictors.

Interpersonal -0.18 0.33 -0.38* • * p <.05

SJI composite -0.05 -0.14 -0.21

Citizenship 0.05 0.23 -0.14

Health -0.31* 0.06 -0.67*

Career 0 .34* 0 .56* 0.14

Adaptability 0.03 0.09 -0.41*

Perseverance 0.13 0 .55* -0.18

Ethics 0.17 -0.06 -0.13

8
Study 2: Predicting FYGPA: Total Sample across
10 Institutions (N = 2443)

9
Predicting Self-Rated Performance:
Total Sample across 10 Institutions (N = 900)

10
Predicting Class Absenteeism: Total Sample across 10
Institutions (N = 899)

11
Representative Subgroup Differences in
Standardized Units

12
Percent of Students Selected:
Two Composites and Three Selection Strategies

Top 85% Top 50% Top 15%


Group AB AB+ AB AB+ AB AB+
Hispanic 4.4  4.6 4.1  4.9 3.9  5.5
(+.2) (+.8) (+1.6)
Asian 7.6  7.7 9.9  9.5 17.5  12.9
(+.1) (-.4) (-4.6)
African-American 17.9  19.8 9.6  13.6 1.3  7.2
(+1.9) (+4.0) (+5.9)
White 70.2  67.9 76.4  71.9 77.2  74.4
(-2.3) (-4.5) (-2.8)

AB = equally weighted composite of HSGPA and SAT/ACT.


AB+ = equally weighted composite of HSGPA, SAT/ACT, Biodata, and SJI.

13
Limitations & Future Research
• Public relations and acceptance of these measures by consumers
(i.e., admissions officers, parents, students). Need to collect
reactions to new admissions measures along a variety of dimensions
(e.g., fairness, face validity).

• Fakability in high-stakes situation especially relevant for biodata, less


so for SJI. However, note that essays can be coached and edited,
and self-reported activities can also be inflated.

• More research and evaluation efforts need to be conducted when


these measures are used operationally in college settings.

14
Study 3: Purpose & Research Questions
• Purpose: evaluating the utility of the biodata and situational judgment measures in as
close to a real admissions situation as is possible
– Administer new measures to college applicants rather than college freshmen.
– On an annual basis, collect class absenteeism, self rated performance of the
noncognitve dimensions, and commitment to the university from enrolled students;
institutions will provide course grades and retention information.
• Research Questions:
– The incremental validity of the biodata and the situational judgment measures will
be assessed after controlling for high school GPA and SAT/ACT scores.
– Differential prediction will also be assessed to see whether each measure-outcome
relationship differs across various subgroups (e.g., gender and race).
– The relationship between scores on these noncognitive measures and holistic file
review will be examined to test whether these measures could be substituted for
the more subjective file review.

15
Preliminary Validity Results…
• A year prior to Study 3 data collection, a
similar pilot study was conducted with only
Michigan State University applicants.
• Comparisons between this sample and our
past studies should reveal the degree to
which the application process itself affects
mean scores, variability, reliability, and
validity of these scales.
MSU Pilot: Demographic Statistics
Predictor Outcome
Variable N % N %
Ethnic Status
Hispanic 25 4.5 5 4.0
Asian 25 4.5 3 2.4
African American 19 3.4 0 0.0
Caucasian 463 83.1 107
84.9
Other 25 4.5 11 8.8

Gender
Male 215 37.6 41 32.5
Female 357 62.4 83 65.9
Note. For Ethnic Status, the Hispanic group includes respondents of Mexican, Puerto Rican, and
Hispanic origin. Total sample size varies across the demographic categories due to missing data.
Response categories for major varied across the two data collections.
MSU Pilot: Results – Mean Differences
Average score at MSU Average score all 10
Dimensions 2006-2007 universities 2004 d-value
Knowledge 3.41 (.46) 3.15 (.47) .54
Continuous Learning 3.40 (.62) 3.09 (.61) .50
Artistic Appreciation 3.15 (.78) 2.91 (.82) .29
Multicultural Appreciation 3.25 (.66) 2.98 (.66) .41
Leadership 3.35 (.77) 3.07 (.81) .35
Social Responsibility 3.67 (.70) 3.32 (.76) .46
Health 3.40 (.51) 3.25 (.51) .30
Career Orientation 3.45 (.61) 3.32 (.65) .20
Adaptability 3.49 (.46) 3.38 (.45) .24
Perseverance 3.88 (.47) 3.73 (.49) .31
Ethics 4.13 (.46) 3.86 (.54) .52
Jobs Scale 2.51 (.86) 2.80 (.58) -.26
Awards Scale 2.24 (.69) 2.42 (.70) -.29
SJI .42 (.14) .33 (.17) .56
Note. Standard deviations are in parentheses next to the means. Positive d values indicate that the 2007 applicant sample
had scores higher than the 2004 student sample.
Incremental Validity of Biodata Measures
2 2 2
Outcomes N R (HSGPA,SAT) Overall R R
BARS 57 0.023 0.443* 0.420*
OCB 57 0.017 0.392 0.374*
Deviance 57 0.025 0.373 0.348
Turnover Intent 58 0.077 0.248 0.172
Academic Satisfaction 58 0.008 0.353 0.345
Social Satisfaction 58 0.077 0.294 0.218
FYGPA 84 0.201* 0.335* 0.134
Absenteeism 58 0.061 0.234 0.173
• To preserve N in these regressions, the SJI was not included because of a relatively low
response rate to this measure.

• It is worth noting that small sample sizes, such as those observed in these analyses, can
seriously limit the ability to detect significant relationships due to decreased statistical power.
Thank You
Thanks to ATP

and

Thanks to you

20
Questions, Comments, Suggestions
• Researchers are encouraged to freely express their
professional judgment. Therefore, points of view or
opinions stated in College Board presentations do not
necessarily represent official College Board position or
policy.

• Please forward any questions, comments, and


suggestions to:
Andrew Wiley at: [email protected]

21

You might also like