Academia.eduAcademia.edu

Speed and Accuracy in Occupational Testing

2002, BPS DOP Conference

Occupational ability tests are widely used for selection, development and guidance applications in occupational psychology. Their developers and users tend to only consider the ‘Raw’ or ‘Right’ score, i.e. the number of questions answered correctly, and disregard information about speed and accuracy. This paper proposes that validity increments can be gained from alternative approaches to ability test scoring and outlines a theoretical framework model. An empirical study on n=208 subjects across six ability tests shows the distribution of the theory based test scores, and demonstrates construct as well as criterion-related validity. Finally a practical example illustrates the issues at hand and their significance for assessment practice.

Speed and Accuracy in Occupational Testing Dr. Rainer H. Kurz, SHL Group plc., 1 Atwell Place, Thames Ditton, KT7 0NE Introduction Occupational ability tests are widely used for selection, development and guidance applications in occupational psychology. Their developers and users tend to only consider the ‘Raw’ or ‘Right’ score, i.e. the number of questions answered correctly, and disregard information about speed and accuracy. This paper proposes that validity increments can be gained from alternative approaches to ability test scoring and outlines a theoretical framework model. An empirical study on n=208 subjects across six ability tests shows the distribution of the theory based test scores, and demonstrates construct as well as criterion-related validity. Finally a practical example illustrates the issues at hand and their significance for assessment practice. Development of the SPACES Model Carrol (1993) covered 100 years of research into Speed-Accuracy Trade-off issues in ability testing but failed to provide conclusive answers. It seems that no satisfactory, coherent ‘text book’ model has emerged so far. Kurz (1990) investigated item response latency and accuracy and found them to be independent aspects of performance, and also demonstrated (1995) that Speed and Accuracy can be measured with a sufficient level of reliability. Kurz (2000) identified a tangible solution in the work of Salkind & Wright (1977) on measuring ‘Impulsivity vs. Reflectivity’ orientation through the ‘Matching Familiar Figures’ (MFF) test. They suggested improved methods for scoring the MFF that entail the calculation of speed and accuracy z-scores, the difference of which represents the candidate standing on the ‘Impulsivity vs. Reflectivity’ dimension, their sum on the ‘Efficient vs. Inefficient’ dimension. Phillips & Rabbitt (1995) applied this approach to ability testing. They investigated Impulsivity and Speed-Accuracy strategies by administering four ability tests and 3 information-processing tasks to 83 subjects aged 50-79. The Impulsivity scores showed correlations between .31 and .63 (i.e. a positive manifold) across the four ability tests. The ‘Impulsivity’ and ‘Efficiency’ dimensions were found to be independent with correlations close to zero. Correlations between the Impulsivity-Reflectivity dimension and questionnaire measures of Extraversion and Impulsivity were all slightly positive (around .15), but not significant. From this line of research Kurz (2000) developed the SPACES model where Speed, operationalised as the ‘Proportion of test questions attempted’ (PAT), and Accuracy, operationalised as the ‘Proportion of questions answered correctly’ (ACC) are viewed as the ‘Input Components’. These are hypothesised to fully determine the ‘Output Composites’ Efficiency, the traditional ‘Raw Score’, and Speed-Accuracy Balance. The latter is measured through the Risky Balance (RB) variable by subtracting zstandardised values of Accuracy from Speed. The Speed and Accuracy Input Components are hypothesised to be independent, and the Output Components Efficiency and Risky Balance as well. Method The sample for the study reported here consisted of n=208 A-level students and adults on ‘Access’ courses (see Kurz & Morris, 1997; Kurz, 2000). Six occupational tests from the SHL range measuring Verbal (VMG1), Numerical (NMG1), Diagrammatic (DC3.1), Clerical (CP7.1C), Spatial (SIT7) and Mechanical (MT4.1C) abilities were administered on computers together with the OPQ CM5.2 personality questionnaire. Results Figure 1 shows the distribution of the proportional ‘Speed’ and ‘Accuracy’ scores. They use a common scale from 0 to 1 that allows easy visual and numerical comparison across tests. A striking result is that all tests show clear ceiling effects on ‘Speed’, Checking also has a ceiling effect on ‘Accuracy’. A second key feature is that the shape of the scatter plot tends to be quite round. These ‘clouds’ indicate great variations in the speed-accuracy trade-off adopted by the candidates. The third point to note is that the centre of the clouds tends to be at speed (PAT) and accuracy (ACC) values around .7, indicating that candidates attempt about 3/4 of the test questions, and get around 2/3 of their answers right. Finally attention is drawn to those candidates that occupy the extremes of the Speed-Accuracy Trade-off surface. Data points in the top left-hand corner of the cloud indicate extremely slow but accurate workers. Data points in the bottom right hand corner indicate extremely quickly but inaccurate workers, often performing at chance level. It is rather likely that ‘Raw’ scores underestimate the potential of the former, and overestimate potential for the latter. Table 1 shows the inter-correlations of the SPACES score variables. Speed and Accuracy are slightly negatively correlated. The variance in Efficiency is primarily accounted for by Accuracy, although Speed contributes substantially in Diagrammatic and Clerical. Efficiency correlations with Risky Balance vary widely and suggest independence as expected. Table 1 Inter-correlations of Speed, Accuracy, Efficiency and Risky Balance Scores Speed Speed Accuracy Efficiency vs. vs. vs. vs. Accuracy Efficiency Efficiency Risky Balance Verbal .01 .43** .90** -.33** Numerical -.43** .15* .80** -.39** Diagrammatic -.27** .55** .64** -.06 Clerical -.07 .90** .38** .35** Spatial -.38** .16** .83** -.40** Mechanical -.09 .24** .94** -.47** Figure 1 Speed (PAT)-Accuracy (ACC) Scatter Plots Verbal Critical Reasoning Numerical Critical Reasoning Speed (NMG1PAT) - Accuracy (NMG1AC) Plot 1.0 1.0 .9 .9 .8 .8 .7 .7 .6 .6 .5 .5 .4 .4 .3 .3 .2 NMG1AC VMG1AC Speed (VMG1PAT) - Accuracy (VMG1AC) Plot .1 0.0 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .2 .1 0.0 1.0 .0 .3 NMG1PAT Diagramming Basic Checking .9 .9 .8 .8 .7 .7 .6 .6 .5 .5 .4 .4 .3 .3 .2 .2 CP71AC 1.0 .1 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 .0 .1 .7 .8 .9 1.0 .2 .3 .4 .5 .6 .7 .8 .9 1.0 CP71PAT Mechanical Comprehension Speed (MT41PAT) - Accuracy (MT41AC) Plot Speed (SIT7PAT) - Accuracy (SIT7AC) Plot 1.0 .9 .9 .8 .8 .7 .7 .6 .6 .5 .5 .4 .4 .3 .3 .2 .2 MT41AC 1.0 .1 0.0 SIT7PAT .6 .1 Spatial Reasoning .1 .5 0.0 DC31PAT 0.0 .4 Speed (CP71PAT) - Accuracy (CP71AC) Plot 1.0 .0 SIT7AC .2 VMG1PAT Speed (DC31PAT) - Accuracy (DC31AC) Plot DC31AC .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 .1 0.0 0.0 .1 MT41PAT .2 .3 .4 .5 .6 .7 .8 .9 1.0 A ‘positive manifold’ was found for each of the four score variables across the six tests. The median of intercorrelations for Speed, Accuracy, Efficiency and Risky Balance was .49, .44, 46 and .48, the first component extracted by factor analysis accounted for 52%, 53%, 55% and 53% percent of the variance respectively. Test-Retest reliability data has been obtained for a sub-sample (n=21) that completed both Computer and the Paper & Pencil versions of the tests as part of an equivalence study. The average correlation coefficients were .49 for Speed, .54 for Accuracy, .75 for Efficiency and .39 for Risky Balance. Correlation values for ‘General’ scores aggregated across all six tests were .67, .78, .91 and .57 respectively. The values should be viewed as conservative lower bound estimates due to the small sample size and the ‘noise’ introduced by calculating reliability ‘across mode of presentation’ with variable time intervals between sessions. To assess construct validity of the SPACES model variables 12 expert practitioners were asked to predict the likely relationship of the four model variables to the ‘Big Five’ Personality Factors. Their predictions were as follows: • • • • ‘Speed’: Conscientiousness (--), Extroversion (++), Emotional Stability (++) ‘Accuracy’: Conscientiousness (++), Extroversion (--), Emotional Stability (--) ‘Efficiency’: Openness to experience (+++) ‘Risky Balance’: Extroversion (++), Emot. Stability (+++), Conscientiousness (--) Contrary to expectations only Openness to Experience correlated consistently (around .20) across the six tests with Risky Balance. The ‘General’ Risky Balance score variable correlated significantly (p<.05, two-tailed test) with Behavioural (-.22), Data rational (-.20), Conceptual (-.20), Decisive (.15), Conscientious (.14), Artistic (.14) and Optimistic (-.14). To assess the criterion-related validity of the SPACES variables the validity coefficients (see Table 2) against the ‘Number of GCSE passes (Grades A-C) were calculated. Accuracy showed the highest validity ahead of Risky Balance (negative) and Efficiency scores. Table 2 Validity of SPACES Scores against GCSE Results (n=91) Speed Accuracy Efficiency Verbal Numerical Diagrammatic Checking Spatial Mechanical .01 -.35** -.14 .16 -.17 .04 .49** .41** .40** .07 .25* .16 .47** .30** .28** .18 .20 .14 Risky Balance -.39** -.44** -.35** .09 -.26** -.12 Discussion The study demonstrates that the SPACES model suitably describes the interplay of Speed, Accuracy, Efficiency and Speed-Accuracy Balance in Occupational Tests. ‘Number Right’ Efficiency scores have been shown to depend on the independent contributions of Accuracy and Speed components. Risky Balance provides additional information about the test taker that is virtually independent of the traditional ‘Raw score’. Test level scores of Speed, Accuracy and Risky Balance suffer from ceiling effects and may not be sufficiently reliable to be interpreted on their own. However scores aggregated across a number of tests achieve interpretable levels of reliability. Computer Based Administration and due care in the test development process may increase the reliability of measurement. There is contrary to expectations no link between Risky Balance and ‘Impulsivity’‘Reflectivity’ related personality variables. However a viable concept of a ‘Risky’ vs. ‘Cautious’ test response style emerged that is characterised by a convergent, pragmatic thinking style. The validity evidence suggests that traditional ‘Raw’ or ‘Right’ scores may underestimate true ability as Accuracy showed often higher validity. Interpreting Accuracy scores in addition to the common ‘Right’ score would improve predictive power of Occupational Tests. Application Let us consider the implications with reference to the Diagrammatic Series (DC3.1) test that contains 40 questions with 5 answer options, and has a time limit of 20 minutes. The results of three candidates - Cathy Cautious, Annie Average and Ricky Risky - are listed in Table 3 below. The Speed-Accuracy Balance variable shown here can be calculated by simply subtracting Accuracy (PAC) from Speed (PAT) without the need for z-standardisation, and correlates around .99 (Kurz, 2000) with the Risky Balance variable referred to above. All three candidates have the same ‘Right’ score and would be judged according to test publisher’s manuals to be ‘equally able’. However they differ considerably in their approach to the test. Cathy worked cautiously with perfect accuracy, but answered only half the questions in the test. Ricky preferred to take risks and managed to answer all 40 questions, but got only half of them right. Annie was balanced in her approach answering 3/4 of the questions, and got 2/3 of her answers right. Table 3 Results on ‘Diagrammatic Series’ Test (DC3.1) Candidate Efficiency Speed Accuracy Name (Number of (Proportion (Proportion Correct of Questions of Answers Answers) Attempted) Correct) Cathy Cautious 20 .50 1 Annie Average 20 .75 .66 Ricky Risky 20 1 .50 SpeedAccuracy Balance -.50 .09 .50 These three candidates clearly differ in their response style on the Speed-Accuracy Balance dimension in spite of sharing the same Efficiency level. But did they really perform equally well? Many standard textbooks on Psychometrics contain formulas that correct scores for the effects of guessing. The ‘Guessing-corrected’ score is usually calculated by dividing ‘Number of Mistakes’ by ‘Number of Answer Options –1’, and deducting this correction variable from the ‘Raw’ or ‘Right’ score (RS). Thus for the DC3.1 test with 5 answer options 1/4 score needs to be deducted for each ‘Wrong’ answer to neutralise the effects of ‘wild guessing’. Applying this correction formula would reduce the scores of Annie Average to 17.5, and for Ricky Risky to 15. Cathy Cautious would remain at her score of 20 and clearly emerge as the ‘most able’ candidate. Conclusion This paper has explored how speed, accuracy, efficiency and test taking style variables are related, and elicited support for the SPACES model. Further validation research is required to compare the relative merits of the SPACES and guessing-corrected scores. In the future test developers should pay more attention to Speed-Accuracy Trade-off issues and provide information on speed and accuracy score distributions. Applying the SPACES approach to test orientation practice, administration, scoring, interpretation and feedback is likely to enhance the fairness, validity and utility of Occupational Tests. References: Kurz, R. (1990). Test Item Theory, Facet Form Concept and the Construction of Parallel Items. Unpublished MSc. Hull: University of Hull. Kurz, R. (1995). Speed-accuracy trade-off: Some reliability studies. Proceedings of the Warwick BPS Occupational Psychology Conference. Leicester: BPS. Kurz, R. (2000). The Facets of Occupational Testing: General Reasoning Ability, Residual Aptitudes & Speed-Accuracy Balance. Unpublished PhD dissertation. Manchester: UMIST. Kurz, R. & Morris, G. (1997). Career guidance via computer? Piloting the AIMS package. Proceedings of the Blackpool BPS Occupation Psychology Conference. Leicester: BPS. Phillips & Rabbitt (1995). Impulsivity and speed-accuracy strategies in intelligence test performance. Intelligence, 21, 13-29. Salkind, N. J. & Wright, J. C. (1977). The development of Reflection-Impulsivity and Cognitive Efficiency. Human Development, 20, 377-387.