Task-based language assessment researches show mastering formal elements of a second language is not sufficient to achieve intended goal in real-world contexts (Mislevy et al. 2002). Although several authors have explored the measurement...
moreTask-based language assessment researches show mastering formal elements of a second language is not sufficient to achieve intended goal in real-world contexts (Mislevy et al. 2002). Although several authors have explored the measurement of linguistic aspects of performances in terms of complexity, accuracy and fluency (CAF) (Housen, Kuiken 2009; Pallotti 2009), few studies actually report on how to rate the pragmatic dimension in L2 task fulfillment.
Kuiken and Vedder recently defined the construct of Functional Adequacy (FA) and developed two rating scales in order to assess writing and speaking argument tasks, demonstrating through experimental studies they are reliable for L2 and easy to handle for both expert and non-expert raters (Kuiken, Vedder 2011; 2014; forthcoming; Vedder 2016).
In our contribution, we replicate Kuiken and Vedder’s studies wondering if and how the FA rating scales are suitable for other types of tasks. The rationale is to test the flexibility and the accuracy of FA descriptors within a different educational context and possibly to propose improvements.
Thus, we asked 15 Marco Polo learners of Italian as L2 who attend the Language Center of Roma Tre University to perform five information-gap tasks, whose outcomes were descriptive, narrative and regulative texts. Following Kuiken and Vedder’s methodology, three expert and three non-expert raters assessed their written and oral performances applying FA rating scales. All scores were analyzed using SPSS and then discussed with the raters. Collected data provide insights into both theoretical and practical issues, regarding the definition of FA descriptors and the applicability of the rating scales. We observe all raters generally agreed about the holistic judgments of each performance, even if some differences arose when they applied the analytic FA descriptors. Moreover, a high correlation between scores let us assume that in such cases some descriptors overlap. Other results will be further discussed.