Academia.eduAcademia.edu

Module six Assessment Task

Choose a passage which has not previously been used in a reading comprehension test. Design a range of questions to test comprehension of it by a group of learners of your choice. Justify your choice of passage and the questions you have set in both theoretical and practical terms.

Module six Assessment Task TS/05/04 Choose a passage which has not previously been used in a reading comprehension test. Design a range of questions to test comprehension of it by a group of learners of your choice. Justify your choice of passage and the questions you have set in both theoretical and practical terms. 1. INTRODUCTION Teacher designed tests are a common feature of most EFL classrooms, and are one way of assessing learners’ language abilities. Designing a test for a specific group of learners may be ‘a matter of problem solving, with every teaching situation setting a different testing problem (Hughes 1989: ix)’, but nevertheless it is important to be clear about the purpose for which a specific test is to be used and to make sure the test designed is appropriate for that purpose. A test of reading comprehension can be seen as testing a particular skill (reading, as opposed to writing, listening or speaking), but such a test could also be designed as a collection of tasks reflecting activities normally performed outside the testing situation, thus in theory enabling the tester to ‘demonstrate how test performance corresponds to non-test language use (Bachman and Palmer 1996: 78).’ From this perspective, consideration of the authenticity of text and task become crucial. This essay explores the development of a test of reading comprehension for use with pre-intermediate learners. The first section looks at general factors to be taken into account when designing language tests. The next section describes the stages I went through in designing a reading comprehension test for use with a particular group of learners. The essay then describes the process of question writing and discusses modifications made as a result of piloting the test, and the last section evaluates how effective the final version of the test was seen to be. Although the final questions were in Finnish, for reasons of clarity all test questions referred to in the context of this essay are in English. 2. DESIGNING TESTS This section will briefly consider some of the factors which a test designer should take into account when designing a language test. These considerations apply to classroom tests as well as tests to be taken by large numbers of students. Bachman and Palmer (1996: 164) point out that ‘design, operationalization and administration need to be carried out for every test…develop[ed]’ but the difference is in the amount of detail and resources involved. 2.1. Testing for a Purpose ‘[T]he primary purpose of tests is to measure (Bachman and Palmer 1996:19)’, but it is the purpose of the measurement that should determine the focus of any specific test. When using tests to obtain information about learners, four main purposes can be identified (Hughes 1989: 9-14, Bachman and Palmer 1996: 97-98), as shown in table 1. It is important to ensure that the test being designed is suited to its purpose, as inferences about language ability, and possibly far-reaching decisions about a candidate’s future may be based on the results. Type of Test Purpose 1. Proficiency Test To assess general ability in a second language. 2. Achievement Test To evaluate how much a learner knows from a defined amount of course or class work 3. Diagnostic Test To identify a student’s strengths weaknesses in specific areas of language. 4. Placement Tests or To determine which would be the most appropriate class, stream or level in which to place a student so that subsequent language teaching is appropriate to their needs. Table 1: Types of Language Test and their Purpose 2.2 Specifications In order to assess if a test measures what it intends to measure (that is, it has construct validity) a set of specifications should be written as part of the overall test design. These include information about content, format, timing, criterial levels of performance (the required level of performance for success) and scoring procedures (Hughes 1989: 49-51). Lynch and Davidson (1994: 732) in their ‘criterion-referenced test development’ approach emphasize the importance of the specification format as a ‘flexible tool that test developers can reshape to respond to specific testing requirements.’ They highlight the importance of what they call the mandate (the reason for the test) for writing test specifications and assert that the test specifications together with the actual task-writing is an effective way of ‘clarifying the criterion being tested (ibid: 730)’. Brown (1994: 387) observes that for classroom tests a specification can be ‘a simple and practical outline of [the] test’ (original emphasis), derived from the test objectives. 2.3 Authenticity The term authenticity, as used in the context of testing, can be understood to mean the degree to which a given task and set of materials corresponds to ‘real life’ tasks and interactions. It is one of the test qualities used by Bachman and Palmer (1996: 23-5) in their model of usefulness, and they suggest (ibid: 19) that test tasks which ‘provide higher degrees of authenticity…’ may be of particular interest to teachers. Authenticity of text and task at the very least can be seen to add face validity to a test. Spence-Brown (2001: 465), among others (for example Bachman and Palmer 1996, Kirschner et al 1996), sees authenticity in testing as having a much wider application, and includes within the paradigm of authenticity the interaction of the test takers with the task and also assessment criteria and procedures. Hughes (1989: 15) adds a reminder that however authentic a given test is designed to be, first and foremost for the candidate it will be a test. Nevertheless authenticity remains an important parameter in test design. 2.4 Pre-testing Pre-testing is a recognised part of test development and facilitates the gathering of information for any necessary revisions, both to the test and its administration. Even with the rigorous development procedures for a global test such as the TOEFL (Test of English as a Foreign Language) reading test described by Peirce (1992: 669-673), pre-testing is included as an integral part of the process (ibid: 677-680). Brown (1994: 389) observes that in the classroom situation trialling a test is seldom a realistic possibility, but recommends a careful final edit as a substitute procedure. Having briefly examined some of the parameters affecting language test design in general, the next section looks at the process of developing a test of reading comprehension for use in the classroom. 3. DESIGNING A READING COMPREHENSION TEST The reading test to be discussed was developed as part of the assessment of a course module concerned with the topic of travelling. The test can be considered an achievement test in so far as it is an evaluation for a specific module, but it could also be considered a proficiency test, as it measures a learner’s ability to gain information from an authentic text (that is, a text originally written for a purpose other than language teaching) through reading. 3.1 The Context: the Learners The learners that I work with attend a special educational institute in Finland which provides vocational upper secondary and adult education. Most of the students are in the age range 16-22, and have special needs, for example, difficulties with reading and writing. As a consequence many students seem to be outwardly unmotivated for further English studies, having already had six years of compulsory English in primary and secondary school. Nevertheless, they are required to take 3 credit units of English during their three year vocational course. The classes are small and heterogeneous, but a sizable majority of students could be considered as falling within the pre-intermediate range. 3.2 Considerations in Text Selection The first thing to do when developing a test of reading comprehension is to decide what the test is going to measure. Spolsky (1985: 181) observes that ‘how we go about measuring something is dependent on what we think we are measuring.’ Hughes (1989: 116-117) discusses both macro-skills (for example scanning a text to locate specific information) and micro-skills (for example using context to guess the meaning of unfamiliar words) as having relevance for the assessment of reading ability. He concludes that ability to demonstrate mastery of the macro-skills also implies mastery of the micro-skills. I was interested in developing a test to measure how well my students would be able to access information from an authentic text using the scaffolding of the test questions, and therefore I felt the integrative approach to testing reading comprehension used in this assessment was appropriate. The text chosen for the test was from the tourist information magazine Time Out 2005/6: London for Visitors. The parts of the magazine used were three pages from the essential information section from the end of the magazine: Getting Around, Resources and Emergencies (the full text can be seen in Appendix 1). The text and this particular part of it were chosen as I thought it representative of the type of information my students may come across when on holiday outside Finland, and containing the type of information they might need. The main purpose of Time Out is to give readers information, and it is written so that information on a particular topic is located under headings and subheadings. For example, under the heading public transport information can be found about the following: travelcards, the London underground, buses, rail services and water transport, all of which are clearly indicated through the use of a bold type face. Hughes’ advice on text selection (1989: 119) includes using passages which contain ‘plenty of discreet information’ if scanning is to be tested. He also suggests considering giving candidates ‘a good number of fresh starts’ by using a number of passages. I considered the passages selected from Time Out to give plenty of scope for scanning questions, detailed reading and fresh starts, while maintaining the central theme of travelling. This kind of text also seems to give the opportunity for testing reading by what Jafapur (1985: 197) writes about as ‘short context technique’, which he claims measures reading skill rather than anything else, and taps ‘relevant realworld reading behaviours (ibid: 205).’ The latter claim seems to equate with task authenticity, which was one of the reasons for choosing this text. 3.3 Aspects of Text and Task Authenticity The degree of resemblance between a passage used in a test and the original text from which it was taken has implications for claims of text authenticity within the context of a test. Hill and Parry (1994: 257) highlight the fact that many authentic texts used in reading comprehension tests are not used in facsimile form, denying readers cues such as type face and format. Although the text used in the reading comprehension test in its final form amounted to three pages, I felt this was justified as the form of the text is as the original, and this is what the students would have to tackle in the real world. To make the reading more manageable the test was divided into sections, so the students only had one page at a time from which to find the answers, and as discussed more fully in section four, the questions provided the direction from which to scan for the information required. The students therefore only needed to read a small amount of text to find the answers. Task authenticity in a reading comprehension test also relates to the language that the test questions are posed in. Hughes (1989: 129) makes the point that the questions should always be less demanding than the text itself. Kirschner et al (1996: 88) and Hill and Parry (1994: 253) advocate the questions being written in the test takers’ own language where possible when this mirrors what they would be doing in real life. Initially I had decided to set the questions to the reading comprehension test in English because this was more practical for me, but during the development process I realised Finnish was more appropriate as this would mean the test would then only be assessing students’ understanding of the text and it would also reflect how the students might approach such a task in ‘the real world’ if required to do so. An additional consideration was the practical one of students accessing and staying with the test and not giving up. 3.4 Specification Writing a specification for a reading comprehension test can focus the task of choosing an appropriate text or texts, and according to Lynch and Davidson (1994: 732) is critical in task development; in this case writing the questions in order to test reading comprehension. The specification for the reading comprehension test under discussion was used for developing the test questions and the final version can be seen in table 2. As mentioned by Lynch and Davidson (1994: 730) writing the questions also ‘[feeds] back to the elaboration of the test specification’, and they suggest that the test specification ‘also provides a detailed record of evidence for judging how well the test items…match what the test claims to be measuring.’ Specification for a reading comprehension test The purpose of this test is to assess the ability of a pre-intermediate learner to obtain accurate information from an authentic written text. It can be important for students to be able to access relevant information from an informational text written for native speakers, using the techniques of skimming and scanning. The text and the questions will relate in some way to travelling. Both the text and the tasks will be authentic, that is, replicating as closely as possible what a learner may be expected to do in the ‘real world’. In Bachman and Palmer’s terms (1996: 18) ‘target language use’ (TLU) domain or tasks. There will be instructions at the beginning of the test in Finnish to explain the purpose of the test, and how the student should go about doing the test. The reading comprehension questions will be in Finnish to reflect TLU. The questions will either be multiple-choice or require a word or words for the answer which can be found within the text. The learner is required to select the correct response or provide the appropriate word or words from the text as an answer. The test will last no longer than 45 minutes and allow time for slower candidates to complete within this time. The scoring will be one or two points for each correct answer, depending on the amount of information required, and the question format. Table 2: Final Specification for a Test of Reading Comprehension 4. THE PROCESS OF QUESTION WRITING Once the text was selected, the next stage was to write questions which reflected the test design. As mentioned above, it became an iterative process whereby the test specification was also modified as a result of piloting the draft items. This section begins by presenting the original version of the test questions, and then looks at how the questions and the format were modified in response to observations and comments received when piloting the test. 4.1 The First Draft Initially the whole test comprised four sections (one to a page) with the first section conceived as an orienting, but also authentic task. The contents page of the essential information of an earlier issue of Time Out was chosen (the contents page in the 2005/6 issue was significantly smaller and more general) and the instructions directed the test takers to choose the heading they would look under to find certain information, with the first item presented as an example. The other three sections in the original draft related to three separate pages in Time Out 2005/6. The contents page can be seen in appendix 2 and the original questions from all four sections in table 3 below. CONTENTS Which section would you look under if you wanted to find: What? Example: a map of the underground 1. somewhere to stay 2. an internet café 3. the Finnish embassy 4. a dentist 5. a place to leave your luggage while you spend the day in London 6. the place to contact to try and find the parcel you left on a train in London 7. a church to go to on Sunday 8. a holiday job in London Where? Books and maps GETTING AROUND 1. Which is the nearest airport to central London: Gatwick or Heathrow 2. Where can you find information on public transport in London? 3. Where can you buy a travel card in London? 4. If you were going to London for the day with an adult friend and two children under the age of 10, which travel card would you buy? 5. Why? 6. Which section would you look under to find out how much it costs to travel on the London Underground? 7. How much would it cost you to hire a bike for a day? 8. If you hired a bike you would need to pay £100 deposit. Explain what you think a deposit is. RESOURCES 1. Where could you go to send an e-mail home if you were visiting London for the day? 2. What are the normal opening times for post offices in Britain? 3. Where would you be expected to tip in Britain? 4. How much should you tip? EMERGENCIES 1. If you call 999 (or 112 from a mobile phone) in an emergency, what will the operator ask you? 2. Where could you go to get emergency dental treatment? 3. When is the emergency dental care service open? 4. What time should you arrive to make sure you get treatment the same day? 5. If you needed dental treatment and were going by public transport, where should you go to? Table 3: Original Questions for the Reading Comprehension Test When designing questions for the students to answer I focused on the text pages and tried to consider what kind of information they would realistically need to look up if visiting London, and therefore what sort of questions they would be asking themselves. The first question was deliberately written as an ‘easy question’, which I expected almost all the test takers to get right, so that they would have the confidence to realise they could extract information even from a text which at first sight could appear rather daunting. To be able to answer this question test takers would have to find a particular heading and sub-heading, and compare two short sections to extract the appropriate information. Which is the nearest airport to central London: Gatwick or Heathrow? Most of the other questions in the first draft were short answer questions, but a choice of possible answers was not provided. The only guidance the test taker had was in the question itself. The questions were almost all asking the test takers to find specific information, and there were between one and four questions per item of information. This was felt to be realistic, in that often one particular item of information is required, such as a phone number, but on other occasions several pieces of linked information could be needed, for example a place together with opening times and information as to how to get there. Where could you go to get emergency dental treatment? When is the emergency dental care service open? What time should you arrive to make sure you get treatment the same day? If you needed dental treatment and were going by public transport, where should you go to? 4.2 Pre-testing The first draft went through two stages of revision, based on pre-testing. The initial version as described above, with questions in English, was given to three family members and a colleague, all of whom are fluent in English. From the first pre-test several things emerged that needed changing. These are outlined below: The length of the test: in its original version there were too many questions. My colleague reported difficulty keeping concentration, and this would probably be even more of a problem for students, whatever their motivation. Unbalanced sections: the sections contained different numbers of questions; it would be a more balanced test if there were the same number of questions for each section. Unexpected answers: some of the questions were interpreted differently than was expected by the test writer as shown by the answers given, and this indicated revision might be in order. The scoring key: this needed revising because of the variation in the answers. As a result of the above a thorough look at the questions was undertaken, and all the questions in which the wording was ambiguous or less than clear were changed or withdrawn. Examples of the modifications and changes that were made are presented in the next sub-section. A second version of the test, still comprising four sections was then trialled with three students, all at roughly pre-intermediate level, male and aged 17. In addition to the changes made in the questions themselves, the questions were now given in both English and Finnish. From this pre-test further changes seemed to be indicated in the following areas: Language of the questions: using both languages was too cumbersome, and the students found it irritating to search through the ‘amount of question material’ to find the question in the language they felt comfortable with. The contents section: this appeared to cause confusion as the students found it difficult and did not really understand the point of it. Apparent problems with this section seemed to affect their attitude to the rest of the test. As a result of the second pre-test, the contents section was dropped completely, as it was apparent that it didn’t perform the orienting function anticipated, and possibly reduced performance on the rest of the test, as well as adding to the length of the test. The language for the instructions and questions was also changed to Finnish for reasons of practicality and authenticity. Some of the other modifications made are discussed in the next sub-section. 4.3 Developing the Questions Kirschner et al (1996: 89) express clearly the obligations a test writer has when developing a test. ‘It is the test writer’s task to define, identify and subsequently remove any potential difficulties inherent in the test questions.’ This implies the importance of pre-testing in the process of test development. It may also mean checking the test specifications and considering how well the tasks (questions) are reflective of the specifications. The specifications themselves may also need to be reconsidered. As pointed out above it became apparent the test was too long, and consequently the number of questions needed to be reduced. As the minimum number of questions for a section in the original test was four, I decided to reduce the number of questions to four for all three sections. This meant in practical terms that each test taker would have three pages of authentic text clipped together, and three pages with questions separately clipped together, individual pages of text and questions corresponding. I thought this would give the students enough opportunity to show their ability in scanning and reading for detail, and also give, in Hughes’ terminology, several ‘fresh starts (1989: 119)’. In the final version (see table 6) the first section, Getting Around, gave three fresh starts, the second section, Resources, gave three fresh starts, and the final section, Emergencies, gave two fresh starts. It also seemed that the format of some of the questions was posing difficulties for the test takers. Kirschner et al (1996: 89) express it this way; ‘test questions constitute a communicative interchange between the test writer and the test taker.’ In this case miscommunication was occurring, which indicated a change in the style or format of the questions was necessary. Looking at the questions and the answers from the pretest more closely the question form did not seem direct enough for the test takers to find the information the test writer was seeking. This was also reflected in the way the questions were presented, and in addition implied changes in the marking may be necessary, as a simple and clear marking scheme is easier to operate for a busy classroom teacher. A comparison of the way the questions were changed for the first section can be seen in table 4. VERSION 1 GETTING AROUND 1.Which is the nearest airport to central London: Gatwick or Heathrow 2.Where can you find information on public transport in London? FINAL VERSION GETTING AROUND 1.Which airport is nearer to central London? Gatwick Heathrow □ □ 3.Where can you buy a travel card in London? 2.Where can you buy a travel card in London? 4.If you were going to London for the day with an adult friend and two children under the age of 10, which travel card would you buy? 3.If you were going to London for the day with an adult friend and two children under the age of 10, which travel card would you buy? 5.Why? 6.Which section would you look under to find out how much it costs to travel on the London Underground? 7. How much would it cost you to hire a bike for a day? 8. If you hired a bike you would need to pay £100 deposit. Explain what you think a deposit is. Day Travelcard One-day Family Travelcard Three-day Travelcard Oystercard □ □ □ □ 4. Which section would you look under to find out how much it costs to travel on the London Underground? Using the system Underground timetable Fares □ □ □ Table 4: Comparison of Questions from the Section ‘Getting Around’, in the First and Final Version of the Reading Comprehension Test. Questions 2, 5, 7 and 8 were omitted in the second version as questions requiring similar information were already in the test and the scope for answers was too diverse. In the case of question 8, the item in question on reflection was thought to be too difficult. The format of questions 4 and 6 in the original was changed to multiple choice. In this way a focus for looking for the answers was provided, but evidence of reading with understanding would still be necessary in order to arrive at the correct answer. Question 3 in the original was still thought to be valid, as to get the correct answer the student would simply have to write (copy) one or more options from the text, once the correct part of the text was identified. One question in the Emergencies section was completely changed as I was unable to formulate it with clarity in relation to the answer I was seeking. I also realised the information in the text was not easy to locate on the page. The change can be seen in table five. FIRST VERSION If you call 999 (or 112 from a mobile phone) in an emergency, what will the operator ask you? FINAL VERSION According to the text, if you lose your credit card, which of the following should you do? Report the loss to the police Phone the 24 hour services Inform your bank All of these □ □ □ □ Table 5: Example of a Question Substitution From a TLU stance it is probably more likely a tourist would want help with a lost credit card than make an emergency telephone call. The test taker in the final version is directed to the options, but has to make a decision between them. This would seem to be realistic. The final version of the entire test can be seen in table 6 below, and the marking key in appendix 3. NAME_______________ DATE________________ Reading Comprehension Test This test is assessing your ability to find information from ‘Time Out’ 2005/6. Time Out is a magazine written for visitors to London. There are three sections in the test. The questions are in Finnish. Look for the answers to the questions from the page which has the same heading as the questions heading. GETTING AROUND 1. Which airport is nearer to central London? Gatwick Heathrow □ □ 2. Where can you buy a travel card in London? 3. If you were going to London for the day with an adult friend and two children under the age of 10, which travel card would you buy? Day Travelcard One-day Family Travelcard Three-day Travelcard Oystercard □ □ □ □ 4. Which section would you look under to find out how much it costs to travel on the London Underground? Using the system Underground timetable Fares □ □ □ RESOURCES 5. Where could you go to send an e-mail home if you were visiting London for the day? 6. Do Post Offices in Britain open on Saturdays? Yes No □ □ 7. Where would you be expected to tip in Britain? 8. How much is a normal tip in Britain? 10 per cent 15 per cent 20 per cent □ □ □ EMERGENCIES 9. According to the text, if you lose your credit card, which of the following should you do? Report the loss to the police Phone the 24 hour services Inform your bank All of these □ □ □ □ 10. Where can you go to get emergency dental treatment in London? Charing Cross Hospital Guy’s Hospital Royal London Hospital St Thomas’ Hospital □ □ □ □ 11. What time should you arrive to make sure you get seen the same day? Before 11 am At 11 am After 11am □ □ □ 12. If you need emergency dental treatment and you have to go by public transport, at which station should you get off ? Guy’s Hospital □ London Bridge □ Table 6: Final Version of the Reading Comprehension Test Having looked at the question writing and re-writing in some detail, the last section approaches the questions of reliability and validity within the context of this classroom test, before looking again at authenticity in the light of piloting and using the test. 5. DISCUSSION The test of reading comprehension was finally used under test conditions with nine students from three different vocational classes. This section discusses the results of these tests and considers the implications for the future development of reading comprehension tests for use in the classroom with similar students. 5.1 Reliability and Validity Reliability and validity may not be at the top of a classroom teacher’s agenda when planning a test, but it is still important to take them into consideration. Reliability is concerned with consistency of measurement and is ‘an essential quality of test scores (Bachman and Palmer 1996: 20).’ While no test can be considered completely reliable it may be possible to ‘…minimize the effects of those potential sources of inconsistency that are under our control through test design (ibid).’ Hughes (1989: 38-41) offers some practical guidelines for increasing reliability which include: making sure there are no ambiguous items, providing clear and explicit instructions and writing a detailed scoring key. Even though my test had been through two pretest versions, while observing the students taking the test and when marking the tests I was still concerned that some of the test items showed evidence of lack of clarity, if not ambiguity. For example, in the Emergencies section, question 10 asks: Where can you go to get emergency dental treatment in London? In the English version, emergency dental treatment is written in italics, thus highlighting which heading the test taker is looking for. The words in the question correspond exactly to the subtitle in the text under which the answer can be found. In translation the test taker doesn’t get the benefit of these exact words, and consequently some of the test takers became frustrated and confused as to whether the relevant information was there at all. This could be a case where ‘authenticity’ (in this case giving the questions in the test takers’ language) mitigates against test performance. The test writer would have to consider if there is a right solution in this case. The revised versions of the test had the effect of simplifying the marking key (see appendix 3). Most items are unequivocal and score 1 point. For questions to which several items were possible, the test taker is credited with one point if only one item is given, but could get two points if two or more items were offered. All acceptable answers are detailed in the key. Some questions are designed to make test takers look at quite small differences, and not necessarily go for the obvious answer. For example, in the resources section, question eight asks: How much is a normal tip in Britain? In this case, although 15 per cent is mentioned in the text, and is perhaps more easily noticed as it appears as a figure, ten per cent is the required answer. This is specified in the key and the test taker would have to show evidence of accurate reading to get the correct answer. If a test has construct validity it should measure the ability it is said to measure. Hughes (1989: 26) states that construct validity is generally unproblematical in a direct test of reading ability. This may well be a reasonable assumption for a teacherwritten test for assessing reading comprehension in the classroom. I am however left with one concern in this regard which relates to the particular students I work with. This is the tension between using an authentic text in facsimile form and the fact some of my students have reading difficulties relating to the physical parameters of reading in any language. This may mean the format of the text itself could be responsible for apparent reading comprehension problems in English as measured by this test, which may not exist if the text size and density were different. 5.2 Authenticity Revisited One of the main cornerstones of this test from my point of view as a test writer/teacher was that of the authenticity of the material and of the tasks developed from it. However, mention has to be made of how the test situation itself affects the concept of authenticity from the perception of the test takers. Spence-Brown (2001: 475) notes that ‘…it has often been observed that in a test the implicit rules of the testing game will over-ride those of the explicit task in determining behaviour and evaluation.’ I noticed that when the students were taking the test, first and foremost it was perceived as a test. The very word test on the first page served to orientate them. They were aware that test behaviour means, for example, quiet individual work and no conferring. The same material used as a classroom activity may be perceived as more authentic because of the absence of pressure that inevitably surrounds a test, and a student could have the choice of whether to work alone or with someone else. Peirce( 1992: 682) makes the point that ‘[the] meaning [of the text] derives from the interaction between the text, the test taker, and the testing situation in which the text is read.’ She argues that a test is of itself an authentic social situation which is recognised as such by test takers (ibid: 685). Authenticity cannot be considered as an absolute term but I am still persuaded that in a test of reading comprehension an authentic/unmodified text is justified. It is also worth taking the time to make the questions as relevant to the text and the test-takers in terms of authenticity as possible, even if the test takers themselves do not give it as much weight as they might in a situation in which assessment was not involved. 5.3 Test Results Of the nine students who took the test only one was unable to do anything with it. With help, in the form of a classroom activity this would have undoubtedly been possible for her, but not in the form of a test. All other students were able to work with the test as given, and it was seen as being within their capabilities. The instructions seemed to relate reasonably well to the format, and despite there being seven pages in two ‘booklets’ there wasn’t any major confusion and eight students completed the test satisfactorily. I felt that this kind of result confirms the validity of the test for use in my classroom, and I would use it or a similar test to assess students’ reading comprehension in the future. 6. CONCLUSION I found the task of developing a test of reading comprehension for use in the classroom forced me to differentiate between a test and an activity. In the classroom both could be seen as fairly interchangeable. ‘…I think any test could just be an activity and any activity could be made into a test…you just evaluate one in a certain way (the test) and the other is used for the learning process (personal communication from a colleague 2006).’ The process of developing this test made me think carefully about the different stages involved and realise the crucial importance of planning to try and ‘assure that the test will be useful for its intended purpose (Bachman and Palmer 1996: 86).’ The model of mandate leading to the interchange between writing the specification and the task or item provided a useful framework on a practical level, but also increased my understanding of the process of test development as a whole. I now have a clearer understanding of how the process of test writing for the classroom compares with developing tests to assess large numbers of students worldwide, being essentially the same, but on a smaller scale, and using the resources available to suit the particular circumstances. REFERENCES Bachman, L.F. and Palmer, A. S. (1996) Language Testing in Practice. Oxford: Oxford University Press. Brown, H.D. (1994) Teaching by Principles. New Jersey: Prentice-Hall. Hill, C. and Parry, K. (1994) ‘Assessing English language and literacy around the world’, in Hill, C. and Parry, K. (eds) From Testing to Assessment. London: Longman. Hughes, A. (1989) Testing for Language Teachers. Cambridge: Cambridge University Press. Jafapur, A. (1987) ‘The short-context technique: an alternative for testing reading comprehension’. Language Testing 4: 195-220. Kirschner, M. Spector-Cohen, E. and Wexler, C. (1996) ‘A teacher Education Workshop on the Construction of EFL Tests and Materials’. TESOL Quarterly 30: 85-107. Lynch, B.K. and Davidson, F.(1994) ‘Criterion-Referenced Language Test Development: Linking curricula, Teachers and Tests’. TESOL Quarterly 28: 727-743. Peirce, B. N. (1992) ‘Demystifying the TOEFL Reading Test’. TESOL Quarterly 26: 665-689. Spence-Brown, R. (2001) ‘The Eye of the Beholder: authenticity in an embedded assessment task’ Language Testing 18: 463-81. Spolsky, B. (1985) ‘What does it mean to know how to use a language? An essay on the theoretical basis of language testing’. Language Testing 2: 180-91 The Time Out Guide: London for Visitors (1991/2) 111. Produced in co-operation with the London Tourist Board and Convention Bureau. Time Out: London for Visitors (2005/6) 118-124. London: Time Out Guides Limited. Appendix 1 Text for the Reading Comprehension Test: taken from Time Out 2005/6 Appendix 2 Contents Page from Survivial, Time Out 1991/2 This was not used in facsimile form in the earlier versions of the test as it comprised only a small part of the page from which it was taken. CONTENTS Emergencies Accommodation Books and Maps Communications Disabled Embassies Gay and Lesbian Health Left Luggage Locksmiths Lost Property Newspapers and magazines Public toilets Reference libraries Religion Security Travel Visas Women Work and Study 111 111 112 112 112 112 113 113 114 115 115 115 115 116 116 117 117 118 118 118 Appendix 3 Marking Key (final version) Getting Around: Resources Emergencies: MARKING KEY 1.Heathrow Airport 1 point 2. Tube and rail stations, London Travel Information Centres, Shops that display the sign 1 point for any or all 3.One-day Family Travelcard 1 point 4. Fares 1 point 4 points maximum 1. A cybercafe, big stores, public library, Cybergate, easyInternet café. 1 point for any of these. 2 points for more than one. 2. Yes 1 point. 3. Taxis, minicabs, restaurants, Hotels, hairdressers, some bars. 2 points for the whole list, 1 point for incomplete list. 4. 10 per cent 1 point 1. All of these 6 points maximum 1 point 2. Guy’s Hospital 1 point 3. Before 11 am 1 point 4. London Bridge 1 point 4 points maximum