Academia.eduAcademia.edu

Turning the Tables on the Turing Test

This art icle was downloaded by: [ Universit y of California Merced] On: 06 Decem ber 2011, At : 01: 09 Publisher: Taylor & Francis I nform a Lt d Regist ered in England and Wales Regist ered Num ber: 1072954 Regist ered office: Mort im er House, 37- 41 Mort im er St reet , London W1T 3JH, UK Connection Science Publicat ion det ails, including inst ruct ions f or aut hors and subscript ion inf ormat ion: ht t p: / / www. t andf online. com/ loi/ ccos20 Turning the tables on the Turing test: The Spivey test Michael J Spivey Available online: 01 Jul 2010 To cite this article: Michael J Spivey (2000): Turning t he t ables on t he Turing t est : The Spivey t est , Connect ion Science, 12: 1, 91-94 To link to this article: ht t p: / / dx. doi. org/ 10. 1080/ 095400900116212 PLEASE SCROLL DOWN FOR ARTI CLE Full t erm s and condit ions of use: ht t p: / / www.t andfonline.com / page/ t erm sand- condit ions This art icle m ay be used for research, t eaching, and privat e st udy purposes. Any subst ant ial or syst em at ic reproduct ion, redist ribut ion, reselling, loan, sub- licensing, syst em at ic supply, or dist ribut ion in any form t o anyone is expressly forbidden. The publisher does not give any warrant y express or im plied or m ake any represent at ion t hat t he cont ent s will be com plet e or accurat e or up t o dat e. The accuracy of any inst ruct ions, form ulae, and drug doses should be independent ly verified wit h prim ary sources. The publisher shall not be liable for any loss, act ions, claim s, proceedings, dem and, or cost s or dam ages what soever or howsoever caused arising direct ly or indirect ly in connect ion wit h or arising out of t he use of t his m at erial. Connection Science, Vol. 12, No. 1, 2000, 91–94 Humour Article Turning the tables on the Turing test: the Spivey test Downloaded by [University of California Merced] at 01:09 06 December 2011 MICHAE L J. SPIVEY Department of Psychology, Cornell University, Ithaca, NY 14853, USA email: spivey@cornell. edu tel: (607) 255-9365 fax: (607) 255-8433 After several decades of research in artificial intelligence (AI) (e.g. Turing 1950, Rosenblatt 1961, Winograd 1972, Rumelhart and McClelland 1986), and even in comparative cognition (e.g. Schusterman et al. 1986, Zentall 1993, Hauser 1996), the cognitive, neural and computational sciences are still loathe to let go of their markedly anthropocentric criteria for ‘intelligence’. Indeed, the only non-subjective evidence that humans are thinking reasoners at all is the mere fact that most of them vehemently claim to be thinking reasoners. Of course, it is trivially easy to program a computer to insist that it is an intelligent, thinking reasoner as well. Rather than allow a oneline BASIC program to be accepted as ‘intelligent’, most researchers would prefer to set the bar a little higher. Therefore, a more stringent test is necessary. Alan Turing (1950) provided that test. In the Turing test, a human judge communicates, via a computer terminal, with an AI conversation program and with a human. If the human judge cannot tell which one is the human, then the AI has passed the Turing test – and may as well be considered as intelligent and capable of thought, as the human is. In a recent Turing test tournament (Loebner 1999), the best AI was rated by the judges as 11% Turing, or humanly intelligent. This may not seem a very impressive success rate, until one considers the success rate of the best human. The best human was rated only 61% humanly intelligent!1 The obvious problem with all of this is the glaring prejudice toward humanlike reasoning as the benchmark of intelligence. Why is not computer-like reasoning also put on such a pedestal? If the results of the recent Turing tournament are any indication, computer-like intelligence is certainly a prominent format (among AI programs as well as humans)! In the work presented here, this prejudice has been remedied. A test inspired by the Turing test was designed in which a human judge communicates, via computer terminal, with a computer program and with a human. The important difference is that instead of the computer program contestant struggling to appear human-like in its communications, the human contestant is struggling to appear computer-like in its communications. If the human judge is unable to determine whether the human contestant is an AI or a human, then the human can be considered as ‘intelligent’ as the computer is. The human judges were 12 graduate students in the cognitive studies programme at Cornell University. The human contestants were 120 Cornell undergraduates from Connection Science ISSN 0954-0091 print/ISSN 1360-0494 online © 2000 Taylor & Francis Ltd http://www.tandf.co.uk/journals Downloaded by [University of California Merced] at 01:09 06 December 2011 92 M. J. Spivey a variety of majors. The AI contestants were a collection of computer programs with varying levels of AI: (1) The MATLAB command window; (2) The Unix program ‘Zippy’; (3) Weizenbaum’s (1974) Rogerian therapist program, ELIZA; (4) an interactive version of Chamberlain and Etter’s (1984) ‘free verse’ poetry program, Racter; (5) Winograd’s (1972) SHRDLU; and (6) Elman’s (1990) simple recurrent network. Importantly, none of the human judges or contestants were told what computer programs were participating. Each judge participated in 10 sessions where he/she communicated with one (randomly selected) computer program and one human; order was counterbalanced across the 10 sessions. At the end of each session, in a two-alternative forced choice paradigm, the judge picked which one of that session’s conversants appeared to display computer-like intelligence. Each conversation lasted exactly 15 min. The complete transcripts comprise a considerable amount of data.2 Some illustrative excerpts are given below. Examples (1) and (2) show the same judge (a philosophy graduate student) interacting first with a computer program (the MATLAB Command Window) and then with a human (a neurobiology undergraduate). Clearly, certain responses to certain questions are dead giveaways of human-like intelligence. Example (3) shows an interaction between a psychology graduate student judge and Elman’s (1990) simple recurrent network. (1) Judge: Why MATLAB: R. T. F. M. Judge: What does that mean? (2) Judge: Why Human: Why what? Judge: You’re deŽnitely a human. (3) Judge: Would SRN: you Judge: please SRN: stop Judge: anticipating SRN: my Judge: every SRN: word! Another illustrative example from the transcripts is a conversation between a linguistics undergraduate contestant and a computer science graduate student judge. The human contestant attempted to fool the judge by responding to every statement with ‘SYNTAX ERROR!’ The judge, remembering her Apple IIe from childhood, then gave the command ‘RUN ZORK’ and the contestant immediately conceded. Finally, one judge, after conversing with SHRDLU, was convinced that he was communicating neither with a human nor a computer, but with the spirit of Gautama Buddha himself. As expected, when the total results were tallied, the undergraduates with the best performance at exhibiting computer-like intelligence (25% success) were those in the computer science major. (These are perhaps the same people who, as mentioned before, fail miserably at the Turing test.)3 The undergraduate major with the overall poorest performance (0%) on the Spivey test was business administration. For Downloaded by [University of California Merced] at 01:09 06 December 2011 Humour article: Turning the tables on the Turing test 93 many of them, it was their Žrst time touching a computer that was not also a cash register. Of the computer program contestants, the MATLAB Command Window was the most frequently identified as having computer-like intelligence (90%). The Unix program ‘Zippy’, which prints out random quotes from Zippy the Pinhead, was the computer program most frequently mistaken for having human-like intelligence (50%). In addition to revealing that some humans actually have computer-like intelligence, rather than human-like intelligence, these results suggest that perhaps future Turing test tournaments should include ‘Zippy’ as an AI contestant. In sum, the Spivey test demonstrates that computer-like reasoning is, for the most part, just as difŽcult for humans to display as human-like reasoning is for an AI to display. Importantly, there appears to be no objective reason for bestowing one form of reasoning with the label ‘intelligent’, or ‘capable of thought’, and not the other. It is hoped that this work will contribute to the growing movement for kinder and more respectful treatment of non-biological life forms.4 Future work will conduct the obvious next permutations of the Turing and Spivey tests, which will have an AI program as the judge. Acknowledgements These musings were supported by discussions with Daniel Richardson, Melinda Tyler and Bob McMurray, and funding from the Sloan Foundation. The data from this ‘experiment’, although they were not scientiŽcally collected and are actually mere hypothetical data points, are consistent with a possible world that bears a considerable likeness to our own. Notes 1. Across all the human participants, the average Turing rating (or human-like intelligence score) was 50%. 2. I considered subjecting the transcripts to a ‘content analysis’, but then I realized I did not know what the hell a ‘content analysis’ was. 3. A notable exception to this greater success by computer science majors was a Chinese literature major who ‘out computeresed’ the MATLAB Command Window. However, it was later discovered that he had smuggled in a MATLAB manual. After receiving a query from a judge, he would rapidly ick through the pages of the manual and reply with an appropriate matrix or ‘undeŽned function’ response. The fact that this contestant claimed to have no understanding of the responses he was typing would, for some, be grounds for disqualiŽcation. However, his responses were perceived by the judge as competent, and in order to adhere to our own Spivey test rules, we did not disqualify him. 4. This hope is in direct opposition to Loebner’s recent recommendation that ‘If we want intelligent robots and computers to care for us, to fetch and to carry for us, as I do, then this belief system [that ‘Humans are gods’] will facilitate the matter’. (http: //www.loebner.net/Prizef/In-response.html) References Chamberlain, W., and Etter, T., 1984, The Policeman’s Beard Is Half-constructed: Computer Prose & Poetry (Warner Books). Elman, J., 1990, Finding structure in time. Cognitive Science, 14: 179–211. Hauser, M. D., 1996, The Evolution of Communication (Cambridge MA: MIT Press). Loebner, H. G., 1999, The Loebner Prize for ArtiŽcial Intelligence. Competition held at Flinders University of South Australia. http: //www.cs.inders.edu.au/research/AI/LoebnerPrize Rosenblatt, F., 1961, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (Buffalo NY: Cornell Aeronautical Laboratory). 94 M. J. Spivey Downloaded by [University of California Merced] at 01:09 06 December 2011 Rumelhart, D. E., and McClelland, J. L. (eds), 1986, Parallel Distributed Processing: Explorations in the Microstructure of Cognition (Cambridge MA: MIT Press). Schusterman, R. J., Thomas, J. A., and Wood, F. G. (eds), 1986, Dolphin Cognition and Behavior: A Comparative Approach (Hillsdale NJ: Lawrence Erlbaum Associates). Turing, A. M., 1950, Computing machinery and intelligence. Mind, LIX (236): 433–460. Weizenbaum, J., 1974, Automating psychotherapy. Communications of the Association for Computing Machinery, 17 (7): 425. Winograd, T., 1972, Understanding Natural Language (New York, Academic Press). Zentall, T. R. (ed.), 1993, Animal Cognition: A Tribute to Donald A. Riley (Hillsdale NJ: Lawrence Erlbaum Associates).