Academia.eduAcademia.edu

Digital Criticism Comes of Age

There is a loose historical continuity in themes and concerns running from the origins of “close” reading in the early 20th century through machine translation and computational linguistics in the third quarter and “distant” reading in the present. Distant reading is the only current form of literary criticism that is presenting us with something new in the way that telescopes once presented astronomers with something new. Moreover it is the only form of criticism that is directly commensurate with the material substance of language. In the long-term it will advance in part by recouping and reconstructing earlier work in symbolic computation of natural language.

Digital Criticism Comes of Age A Working Paper William Benzon • December 2015 Digital Criticism Comes of Age William L. Benzon December 2015 Abstract: There is a loose historical continuity in themes and concerns running from the origins of “close” reading in the early 20th century through machine translation and computational linguistics in the third quarter and “distant” reading in the present. Distant reading is the only current form of literary criticism that is presenting us with something new in the way that telescopes once presented astronomers with something new. Moreover it is the only form of criticism that is directly commensurate with the material substance of language. In the long-term it will advance in part by recouping and reconstructing earlier work in symbolic computation of natural language. CONTENTS Computation in Literary Study: The Lay of the Land ....................................................... 2 Digital Criticism @ 3QD ................................................................................................. 5 The Only Game in Town: Digital Criticism Comes of Age ............................................... 7 Distant Reading and Embracing the Other .................................................................................................... 7 Reading and Interpretation as Explanation ..................................................................................................... 8 A Mid-1970s Brush with the Other .................................................................................................................. 9 The Center is Gone ........................................................................................................................................... 10 Current Prospects: Into the Autonomous Aesthetic ................................................................................... 11 Commensurability, Meaning, and Digital Criticism ....................................................... 13 1301 Washington Street, Apartment 311 Hoboken, New Jersey 07030 [email protected] This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. 1 Computation in Literary Study: The Lay of the Land It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way … – Charles Dickens Oh yeah I'll tell you something I think you'll understand When I say that something I wanna hold your hand – John Lennon and Paul McCartney I have a somewhat different perspective on digital humanities than do most DH practitioners and most humanists who know of it – and who doesn’t? – but are not practitioners. “Digital Humanities” covers a range of practices which are quite different from one another and are united only by the fact that they are computer intensive. A digital archive of medieval manuscripts is quite different from a 3D graphic re-creation of an ancient temple and both are different from a topic analysis of 19thcentury British novels. It is the last that most interests me, and I think of it as an example of computational criticism. So, in the following diagram we see that computational criticism (CC) is a subset of digital humanities (DH): But it is also a subset of naturalist criticism (NC). The naturalist critic treats literary phenomena as existing in the natural world along with other phenomena such as asteroids, clouds, squid, yeast, thunder, fireflies, and of course human beings. I’ve been doing naturalist criticism since the mid-1970s when I studied computational linguistics with David Hays while getting my degree in English literature [1]. Hays was in the first generation of researchers in machine translation and coined the term “computational linguistics” in the early 1960s. At that time, and well into the 1980s, researchers created hand-coded symbolic models of linguistic processes: phonology, morphology, syntax, semantics, and pragmatics. In the 2 1980s, however, that research style was displaced by one that emphasized machine learning of large data sets using statistical models. Those statistical methods are in turn the foundation of much current work in computational criticism. Thus, while I do not myself work with those models, there is a historical connection between them and what I was doing four decades ago. Even though they are quite different from the models I worked with, they are in the same intellectual milieu, one I’ve worked in for most of my career. It’s the world of social and behavioral sciences as practiced under the aegis of computation. It’s a world of naturalistic study. Today’s computational critics, however, show no interest in those old symbolic models. For one thing, they aren’t of direct use to them [2]. Moreover, since they propose that, in some way, the human mind is itself computational in kind, they may be too hot to handle – though Willard McCarty has been handling it for years, as I discuss in “The Only Game in Town”. In many humanistic circles its bad enough that these researchers are using computers for more than word-processing and email, but to seriously propose that the human mind is computational – no, that’s not worth the risk. The horror! The horror! And yet, in the long run, I don’t think the idea can be held at bay. Humanists will have to confront it and when they do, well, I don’t know what will happen, though I myself have been comfortable with the idea for years. As I was drafting this piece, I got pinged about a post that Ted Underwood just put up pointing our increasing methodological overlap between literary history and sociology [3]. There you’ll find paragraphs like this one: Close reading? Well, yes, relative to what was previously possible at scale. Content analysis was originally restricted to predefined keywords and phrases that captured the “manifest meaning of a textual corpus” (2). Other kinds of meaning, implicit in “complexities of phrasing” or “rhetorical forms,” had to be discarded to make text usable as data. But according to the authors, more recent approaches to text analysis “give us the ability to instead consider a textual corpus in its full hermeneutic complexity,” going beyond the level of interpretation Kenneth Burke called “semantic” to one he considered “poetic” (3-4). This may be interpretation on a larger scale than literary scholars are accustomed to, but from the social-scientific side of the border, it looks like a move in the direction of rhetorical complexity. The closer you get to rhetorical complexity, the more likely that sooner or later you’re going to bump into phenomena at a scale appropriate to those old symbolic models, not to mention the kind of hand-crafted description of individual texts that I’ve been advocating [4]. But that’s a diversion. I was really headed toward a recent article by Yohei Igarashi, Statistical Analysis at the Birth of Close Reading [5]. Here are some sentences from the opening paragraph (p. 485): It may be instructive to remember, then, that close and non-close reading, far from crossing paths for the first time recently, encountered one another in the early twentieth century on the terrain of educational “word lists.” The genre of the word list, a list of words usually taking word frequency as its elemental measure, shaped close reading at its founding. In fact, the founder of close reading, I. A. Richards, believed that one such word list, Basic English—a parsimonious but usable version of the English language reduced to only 850 words—was the best tool for teaching students to be better close readers. So, at its historical inception, “close reading” shares a thematic fellow traveler with “distant reading”, language statistics. And in an endnote Igarashi notes that “machine translation, from its advent at midcentury to the present day, has adopted Basic English’s logics of functionality, performativity, and universalism” (p. 501). That’s where I was with David Hays in the 1970s at Buffalo and as for the notion of a limited vocabulary from which all else could be derived, that was certainly an important one in a number of those models in one form or another. Why not? It’s all language, no? 3 While Igarashi’s article is historical, in recounting a particular history it is also outlining a boundary that encompasses Ogden and Richards in the first quarter of the 20th century, machine translation and computational linguistics in the third quarter, and “distant” reading in the first quarter of the 21th century. That’s the world of phenomena that we’ll be investigating and reconstructing for the foreseeable future. That’s the land before us. ***** [1] See, e.g. my Cognitive Networks and Literary Semantics. MLN 91: 952-982, 1976. URL: https://www.academia.edu/235111/Cognitive_Networks_and_Literary_Semantics William Benzon and David Hays, Computational Linguistics and the Humanist. Computers and the Humanities 10: 265 - 274, 1976. URL: https://www.academia.edu/1334653/Computational_Linguistics_and_the_Humanist [2] I have a working paper in which I sketch-out some connections between those older models and these newer ones, Toward a Computational Historicism: From Literary Networks to the Autonomous Aesthetic (2014) 27 pp. URL: https://www.academia.edu/7776103/Toward_a_Computational_Historicism_From_Literary_Netw orks_to_the_Autonomous_Aesthetic [3] Ted Underwood. Emerging conversations between literary history and sociology. The Stone and the Shell, blog post. December 3, 2015. URL: http://tedunderwood.com/2015/12/02/emergingconversations-between-literary-history-and-sociology/ [4] For example, see my most recent working paper on description where I discuss the relationship between symbolic models from the 1970s and 1980s and the formal structure of texts, Description 3: The Primacy of Visualization, Working Paper, October 2015. URL: https://www.academia.edu/16835585/Description_3_The_Primacy_of_Visualization [5] Yohei Igarashi. Statistical Analysis at the Birth of Close Reading. New Literary History, Volume 46, Number 3, Summer 2015, pp. 485-504. DOI: 10.1353/nlh.2015.0023 4 Digital Criticism @ 3QD Think of this as a preface to the next section, “The Only Game in Town: Digital Criticism Comes of Age”, for that is in fact what it is. I put this on my personal blog, New Savanna, as an introduction to a much longer post, around the corner and overt there, at 3 Quarks Daily, a group blog. ***** I open with Moretti – natch – then to Willard McCarty’s 2013 Busa Award Lecture, where he talks of embracing the computer as Other. I end with Said on his belief in an autonomous aesthetic realm, despite the difficulties of conceptualizing how it could possibly work. The thrust of the article, though, is whether or not we can actually get this venture moving, really moving. What are the chances of really embracing the Other? Though I made my peace with the computer years ago, and so am biased, I don’t know the answer to that question. But I’ve made some progress in figuring out what that question entails and that form the bulk of my essay. The issue is one that’s been with academic literary study since the early 20th Century. In the 1920s the matter was stated most succinctly by Archibald MacLeish, that poems should not mean but be. In that late 1950s we find ourselves in the “Polemical Introduction” to Northrup Frye’s wellknown Anatomy of Criticism (pp. 27-28): The reading of literature should, like prayer in the Gospels, step out of the talking world of criticism into the private and secret presence of literature. Otherwise the reading will not be a genuine literary experience, but a mere reflection of critical conventions, memories, and prejudices. The presence of incommunicable experienced in center of criticism will always keep criticism as art, as long as the critic recognizes that criticism comes out of it but cannot be built on it. The issue came home to me in a rejection letter for my first essay on “Kubla Khan” – which ended up going into Language and Style in 19851 – where the reviewer complained that the essay “ought to argue with itself, to put into question some of the patterns it establishes-or better, perhaps to let the poem talk back.” What does he mean, “let the poem talk back”? I know very well that the statement isn’t meant to be taken literally. But what’s the non-literal version of the statement? Under what circumstances could a poem do something like talk back? Under face-to-face performance circumstances. To be sure, the poem doesn’t talk, but the poet does. The poet recites the poem, the teller spins the tale, the audience reacts with silence, groans, laughter, remarks, and the poet replies. There the poet/story-teller and audience share the same physical and discursive space and so CAN interact in real time. But criticism really isn’t like that, no matter how much this or that critic wishes otherwise. The issue remains on the table. Here’s a more recent version: Hans Adler and Sabine Gross “Adjusting the Frame: Comments on Cognitivism and Literature,” Poetics Today 23:2 (Summer 2002, pp. 195-220) p. 215: Literary texts are designed to open up spaces for interpretation: different readers in different contexts weigh elements and fill gaps in different ways that complement the common ground of comprehension that is determined both by the text and by shared assumptions and contextual knowledge. In a sense, the positioning of noncognitivist and cognitivist literary studies reenacts the discussions about predictability/determination versus subjectivity/individuality in reader-response theory in the 1970s and 1980s: how much freedom do readers have in filling in gaps Articulate Vision: A Structuralist Reading of “Kubla Khan”. Language and Style. Vol. 8, 1985, pp. 329. https://www.academia.edu/8155602/Articulate_Vision_A_Structuralist_Reading_of_Kubla_Khan_ 5 1 and creating the actual text or interpretation; to what extent do written text and interpretive communities (substitute here: the cognitive apparatus common to all readers) determine individual readings? Really? Literary texts are designed just so? By whom? I don’t believe it, nor, I suspect do Adler and Gross. That formulation is just another trope for picturing text and critic in the same frame. But I don’t discuss any of those examples in the 3QD essay. There I choose four texts from the mid-1970s, texts by Geoffrey Hartman, Eugenio Donato, Jonathan Culler, and Umberto Eco. I think of the 1970s as a turning point, the period when literary criticism turned its back on the Computational Otherness emanating from linguistics and the nascent cognitive sciences, not to mention humanities computing (here I’m thinking of some essays in which Stanley Fish skewers some statistical stylistics). What will happen this time around? I’ll leave you with a passage from Moretti’s recent interview in Salon:2 There’s a group of students at the Stanford Literary Lab that’s studying the transformation of meter in poetry and a group studying the transformations of suspense in narrative. Both of them are trying to understand what makes certain forms thrive and others not. That’s always the hardest part. It’s also the hardest to make others interested in. I’m not sure, but I suspect that for a long time it was much easier for an astrologer to make people interested in the skies than for an astronomer. It was much nicer to tell stories of the stars holding the destiny of human beings in their hands. Astronomy, with its calculations, seemed cold by comparison. Plenty of people still care about astrology, of course, including myself. I’m a Leo, and of course I believe in astrology [laughing]. But astronomy, just like biology and all the other sciences, has managed to enter the culture and make a lot of people interested in a good explanation, a bold theory, an interesting conjecture. These have become part of what millions of people are interested in. I would hope that literary history moves to that stage as well. That it maintains what has been great in its past, but also becomes something that becomes interesting for nonspecialists because it has some bold ideas about why, for example, the chorus at a certain point disappears from tragedy. Just as we ask why dinosaurs went extinct, we should ask why the chorus went extinct. Of course, I have a young son so I know that the chorus is less interesting than T. Rex and Company, but still: Why did the chorus go extinct? 2 URL: http://www.salon.com/2014/04/23/learning_from_failed_books/ 6 The Only Game in Town: Digital Criticism Comes of Age Distant Reading and Embracing the Other As far as I can tell, digital criticism is the only game that's producing anything really new in literary criticism. We’ve got data mining studies that examine 1000s of texts at once. Charts and diagrams are necessary to present results and so have become central objects of thought. And some investigators have all but begun to ask: What IS computation, anyhow? When a died-in-the-wool humanist asks that question, not out of romantic Luddite opposition, but in genuine interest and open-ended curiosity, THAT's going to lead somewhere. While humanistic computing goes back to the early 1950s when Roberto Busa convinced IBM to fund his work on Thomas Aquinas – the Index Thomisticus3 came to the web in 2005 – literary computing has been a backroom operation until quite recently. Franco Moretti, a professor of comparative literature at Stanford and proprietor of its Literary Lab,4 is the most prominent proponent of moving humanistic computing to the front office. A recent New York Times article, Distant Reading,5 informs us ...the Lit Lab tackles literary problems by scientific means: hypothesis-testing, computational modeling, quantitative analysis. Similar efforts are currently proliferating under the broad rubric of “digital humanities,” but Moretti’s approach is among the more radical. He advocates what he terms “distant reading”: understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data. Traditional literary study is confined to a small body of esteemed works, the so-called canon. Distant reading is the only way to cover all of literature. But Moretti has been also investigating drama, play by play, by creating diagrams depicting relations among the characters, such as this diagram of Hamlet: 3 URL: https://en.wikipedia.org/wiki/Roberto_Busa URL: http://litlab.stanford.edu/ 5 URL: http://www.nytimes.com/2011/06/26/books/review/the-mechanic-muse-what-is-distantreading.html?amp;_r=0&adxnnl=1&;gwt=regi&pagewanted=all&adxnnlx=1398622721foXozyEphSHqZ5Lf8/ngtg 4 7 The diagram gives a very abstracted view of the play and so is “distant” in one sense. But it also requires Moretti to attend quite closely to the play, as he sketches the diagrams himself and so must be “close” to the play. His most recent pamphlet, “Operationalizing”: or, the Function of Measurement in Modern Literary Theory (December 2013, PDF6), discusses that work and concludes by observing: “Computation has theoretical consequences—possibly, more than any other field of literary study. The time has come, to make them explicit” (p. 9). If such an examination is to take place the profession must, as Willard McCarty asserted in his 2013 Busa Award Lecture (link at the end), embrace the Otherness of computing: I want to grab on to the fear this Otherness provokes and reach through it to the otherness of the techno-scientific tradition from which computing comes. I want to recognize and identify this fear of Otherness, that is the uncanny, as for example, Sigmund Freud, Stanley Cavel, and Masahiro Mori have identified it, to argue that this Otherness is to be sought out and cultivated, not concealed, avoided, or overcome. That it’s sharp opposition to our somnolence of mind is true friendship. In a way it is odd that we, or at least the humanists among us, should regard the computer as Other, for it is entirely a creature of our imagination and craft. We made it. And in our own image. Can the profession even imagine much less embark on such a journey? Reading and Interpretation as Explanation To appreciate what’s at stake we need to consider how academic literary critics think of their craft. This passage by Geoffrey Hartman, one of Yale’s so-called “Gang of Four” deconstructive critics, is typical (The Fate of Reading, 1975, p. 271): 6 URL: http://litlab.stanford.edu/LiteraryLabPamphlet6.pdf 8 I wonder, finally, whether the very concept of reading is not in jeopardy. Pedagogically, of course, we still respond to those who call for improved reading skills; but to describe most semiological or structural analyses of poetry as a "reading" extends the term almost beyond recognition. Note first of all that “reading” doesn’t mean quite what it does in ordinary use. In standard academic usage a written exegesis is called “a reading.” In that passage Hartman is drawing a line in the sand. What he later calls the “modern ‘rithmatics’—semiotics, linguistics, and technical structuralism” do not qualify as reading, even in the extended professional sense. Nor, it goes without saying, would Moretti’s “distant reading” qualify as reading. What each of these methods does is to objectify the text and thereby “block” the critic’s “identification” with and entry into the text’s world. I cannot place too much stress on how foundational this sense of reading is to academic literary criticism. For the critic, to “read” a text is to explain it. The hidden meaning thus found is assumed to the animating force behind the text, the cause of the text. Objectification gets in the way of the critic’s identification with the text and so displaces this “reading” process. By the time Hartman wrote that essay, the mid-1970s, the interpretative enterprise had become deeply problematic. We can trace the problem to the so-called New Critics, who rose to prominence after World War II with their practice of “close” reading. These critics insisted on the autonomy of the text. Literary texts contain their meaning within themselves and so must be analyzed without reference to authors or socio-historical context. The text stands alone. The problem is that “reading” presupposes some agent behind the text. How then can interpretation proceed when authors and context have been ruled out of court? As a practical matter, it turns out that as long as critics share common assumptions, they can afford the pretense that texts stand alone. That’s what the New Critics and much of the profession did. But by the 1960s critics began noticing that, hey, we don’t always agree in our readings and there’s no obvious way to reconcile the differences. Critics began to suspect that each in her own way was reading herself into this supposedly autonomous text. That’s when the French landed in Baltimore – at the (in)famous structuralism conference at Johns Hopkins in 19667 – and literary criticism exploded with ideas and controversy. A Mid-1970s Brush with the Other Hartman published that essay in 1975, roughly the mid-point of this process. There are other texts from this period that illustrate how literary criticism brushed up against objectification but then deflected it. Eugenio Donato’s 1975 review of Lévi-Strauss’s Mythologiques is particularly important. On the one hand, as Alan Liu has pointed out in a recent essay, “The Meaning of the Digital Humanities” (PMLA 128, 2013, 409-423; see video below) Lévi-Strauss’s structuralist anthropology was “a midpoint on the long modern path toward understanding the world as system” (p. 418) – which, we’ll see later on, has brought the critical enterprise to the point of collapse. On the other, Donato was one of the organizers of the 1966 structuralism conference. Donato’s review appeared in Diacritics (vol. 5, no. 3, p. 2) and was entitled “Lévi-Strauss and the Protocols of Distance.” Notice that trope of distance, which has served to indicate the relationship of critic to text from the New Critics to Moretti. Donato tells us that he isn’t interested in Lévi-Strauss’s accounts of specific myths, that is, his “technical structuralism” in Hartman’s phrase. What interests Donato is that “despite Lévi-Strauss’ repeated protestations to the contrary, the anthropologist is not completely absent from his enterprise.” What interests Donato is the way Lévi-Strauss’s text itself is like a literary text and so can be analyzed as one. That is typical of deconstructive criticism and of Lévi-Strauss’s reception among 7 URL: http://hub.jhu.edu/magazine/2012/fall/structuralisms-samso 9 literary critics. Beyond reference to binary oppositions few were interested in his analysis of myths or other ethnographic materials. None were interested in abstracting and objectifying literary texts in the way Lévi-Strauss had objectified myths through his use of diagrams, tables, and quasi-mathematical formulas. A year later Umberto Eco published A Theory of Semiotics (1976) in which he used a computational model devised by Ross Quillian in 1968 and posited it as a basic semiotic model. But Eco didn’t use any of the later and more differentiated models developed in the cognitive sciences, nor did any other literary critics. Quillian’s model took the form of a network in which concepts were linked to other concepts. And that’s what Eco liked, the notion of concepts being linked to other concepts in “a process of unlimited semiosis” (p. 122). What had to be done to make such a model actually work, that didn’t interest him. During this same period Jonathan Culler published Structuralist Poetics (1975), where he observed that linguistic analysis is not hermeneutic (p. 31). On this he agrees with Hartman. He also employed some ideas from Chomskyian linguistics – itself derived from abstract computational considerations, such as the contrast between deep structure and surface structure and the notion of linguistic competence, which became literary competence. But Culler didn’t continue down this path; few did. The Center is Gone By the turn of the millennium, however, it became increasingly clear that things were rotten in Criticland. In one of his last essays, Globalizing Literary Study (PMLA, Vol. 116, No. 1, 2001, pp. 64-68), Edward Said notes: “An increasing number of us, I think, feel that there is something basically unworkable or at least drastically changed about the traditional frameworks in which we study literature“ (p. 64). He goes on (pp. 64-65): I myself have no doubt, for instance, that an autonomous aesthetic realm exists, yet how it exists in relation to history, politics, social structures, and the like, is really difficult to specify. Questions and doubts about all these other relations have eroded the formerly perdurable national and aesthetic frameworks, limits, and boundaries almost completely. The notion neither of author, nor of work, nor of nation is as dependable as it once was, and for that matter the role of imagination, which used to be a central one, along with that of identity has undergone a Copernican transformation in the common understanding of it. Thought about that autonomous aesthetic realm presupposes stable, and ultimately traditional, conceptions of the self, the nation, of identity, and the imagination. Given those as a vantage point, the critic can interpret texts and thereby explore the autonomous aesthetic. Without those now shattered concepts that autonomous aesthetic realm is but a phantasm of critical desire. Post-structuralist criticism had dissolved things into vast networks of objects and processes interacting across many different spatial and temporal scales, from the syllables of a haiku dropping into a neural net through the processes of rendering ancient texts into movies made in Hollywood, Bollywood, or “Chinawood” (Hengdian,8 in Zhejiang Province) and shown around the world. Authors became supplanted by various systems. Lacanians revised the Freudian unconscious. Marxists found capitalism and imperialism everywhere. And signs abounded, bewildering networks of signs pointing to signs pointing to signs in enormous tangles of selfreferential meaning. Identity criticism stepped into the breech opened by deconstruction. Feminists, African-Americans, gays and lesbians, Native Americans, Latinos, subaltern post-colonialists and others spoke from their histories, reading their texts against and into the canon. If it has become so difficult to gain conceptual purchase on the autonomous aesthetic realm, then perhaps we need new conceptual tools. Computation provides tools that allow us to examine large bodies of texts in new ways, ways we are only beginning to utilize. Computation gives us ways 8 URL: http://en.wikipedia.org/wiki/Hengdian_World_Studio 10 to think about systems on many scales, about how they operate in a coherent way, for fail to do so. It gives us tools to examine such systems, but also to simulate them. Current Prospects: Into the Autonomous Aesthetic And so we arrive back at Moretti’s call for an exploration of the implications of computing for literary criticism. That the profession has no other viable way forward does not imply that it will continue the investigation. It could after all choose to languish in the past. What reason do we have to believe that it will choose to move forward? The world has changed since the 1970s. On the one hand, the critical approaches that were new and exciting back then have collapsed, as Said has indicated. On the other hand, back then the computer was still a distant and foreign object to most people. That’s now changed. Powerful computers are ubiquitous. Everyone has at least one, if not several – smartphones, remember, are powerful computers. Even more importantly, we now have a cohort of literary investigators who grew up with computers and for whom objectified access to texts is familiar and comfortable. Geoffrey Hartman was worried about getting, and staying, “close” to the text. But Moretti talks of “distant” reading and so do others. Can it be long before critics figure out that distance is really objectification and that it’s not evil? We also have a generation of younger scholars who draw on cognitive science and evolutionary psychology, disciplines that didn’t exist when the French landed in Baltimore in 1966. Computation played a direct role in cognitive science by serving as a model for the implementation of mind in matter. And, via evolutionary game theory, computation is the intellectual driver behind evolutionary psychology as well. Finally, over the last couple of years various digital humanists have sensed that the field lacks a substantial body of theory. And there is a dawning awareness that this theory doesn’t have to be some variation on post-structuralist formations trailing back into the 1960s. Perhaps we need some new kinds of theory. In a recent article, Who You Calling Untheoretical? (Journal of Digital Humanities Vol. 1, No. 1 Winter 20119) Jean Bauer notes that the database is the theory: When we create these systems we bring our theoretical understandings to bear on our digital projects including (but not limited to) decisions about: controlled vocabulary (or the lack thereof), search algorithms, interface design, color palettes, and data structure. Finally, in order use the computer as a model for thought one doesn’t have to adopt the view, mistaken in my opinion, that brains are computers. Whatever its limitations, the computer, as idea, as abstract model, is the most explicit model we have for the mind. Or, to be more precise, it is the most explicit model we have for how a mind might be embodied in matter, such as a brain. That’s a way forward, a rich theoretical opportunity. But it’s not a matter of latching on to a handful of ideas. It’s a matter of long-term investigation into territory that is as yet unexplored. For the nature of computing itself is still fraught with mystery. It is up to us to redefine it for the theoretical investigation and study of the humanities. ***** 9 URL: http://journalofdigitalhumanities.org/1-1/who-you-calling-untheoretical-by-jean-bauer/ 11 DH2013 Busa Award Lecture by Willard McCarty: http://www.youtube.com/embed/nTHa1rDR680 Meaning of the Digital Humanities - Alan Liu: http://www.youtube.com/embed/IrvUys_STcs May 5, 2014 12 Commensurability, Meaning, and Digital Criticism What do I mean by “commensurate”? Well…psychoanalytic theory is not commensurate with language. Neither is semiotics. Nor is deconstruction. But digital criticism is, sorta. Cognitive criticism as currently practiced is not commensurate with language either. Let me explain. In the early 1950s the United States Department of Defense decided to sponsor research in machine translation; they wanted to use computers to translate technical documents in Russian into English. The initial idea/hope is that this would be a fairly straightforward process. You take a sentence in the source language, Russian, identify the appropriate English words for each word in the source text, and then add proper English syntax and voilà! your Russian sentence is translated into English. Alas, it’s not so simple. But researchers kept plugging away at it until the mid-1960s when, tired of waiting for practical results, the government pulled the plug on funding. That was the end of that. Almost. The field renamed itself and became computational linguistics and continued research, making slow but steady progress. By the middle of the 1970s government funding began picking up and the DoD sponsored an ambitious project in speech understanding. The goal of the project was for the computer to understand “over 90% of a set of naturally spoken sentences composed from a 1000‐word lexicon” [1]. As I recall – I read technical reports from the project as they were issued – the system was hooked to a database of information about warships. So that 1000word lexicon was about warships. Those spoken sentences were in the for of questions and the system demonstrated its understanding by producing a reasonable answer to the question. The knowledge embodied in those systems – four research groups worked on the project for five years – is commensurate with language in the perhaps peculiar sense that I’ve got in mind. In order for those systems to answer questions about naval ships they had to be able to parse speech sounds into phonemes and morphemes, identify the syntactic relations between those morphemes, map the result into lexical semantics and from there hook into the database. And then the process had to run in reverse to produce an answer. To be sure, a 1000 word vocabulary in a strictly limited domain is a severe restriction. But without that restriction, the systems couldn’t function at all. These days, of course, we have systems with much more impressive performance. IBM’s Watson is one example; Apple’s Siri is another. But let’s set those aside for the moment, for they’re based on a somewhat different technology than that used in those old systems from the Jurassic era of computational linguistics (aka natural language processing). Those old systems were based on explicit theories about how the ear decoded speech sounds, how syntax worked, and semantics too. Taken together those theories supported a system that could take natural language as input and produce appropriate output without any human intervention between the input and the output. You can’t do that with psychoanalysis, semiotics, deconstruction, or any other theory or methodology employed by literary critics in the interpretation of texts. It’s in that perhaps peculiar sense that the theories with which we approach our work are not commensurate with the raw material, language, of the objects we study, literary texts. About all we can say about the process through which meaning is conveyed between people by a system of physical signs is that it’s complex, we don’t understand it, and it is not always 100% reliable. The people who designed those old systems have a lot more to say about that process, even if all that knowledge has a somewhat limited range of application. They know something about language that we don’t. The fact that what they know isn’t adequate to the problems we face in 13 examining literary texts should not overrun the fact that they really do know something about language and mind that we don’t. Now, what about Siri and Watson? The type of research employed in those old speech understanding systems went on for about a decade and was replaced by a somewhat different methodology. This methodology dispensed with those explicit theories and instead employed sophisticated statistical techniques and powerful computers, operating on large data sets. These statistically based learning approaches first appeared in speech recognition and optical character recognition (OCR). The goal in these systems is simply to recognize the input in computer-readable terms. There’s no attempt at understanding, translating into another language, or answering questions. That’s come later. The big task these days is to combine the two approaches, hand-coded theory-based knowledge, with statistical learning, into a single system. But that’s not where I’m going with this post. Where I’m going is that the statistical techniques employed in digital criticism are of a piece (and often the same as) the statistical techniques employed in OCR, speech recognition, and more ambitious systems such as Siri and Watson. The larger point is simply that digital criticism is, in the sense I’m employing the term, commensurate with language in a way that conventional criticism is not. Digital criticism starts with the raw signifiers and that’s it. By analyzing large highly structured collections of raw signifiers (that is, collections of texts) these methods produce descriptions of those collections that give us (that is, human critics) clues about what’s going on in those texts. As far as I can tell, those clues could not be produced in any other way. It’s not as though digital critics are doing things that would be done better with an army of critics that we don’t have. Even if we had that army reading all those texts, how would they express their understanding of what they read? How could they aggregate their results? No, digital criticism is not a poor substitute for hordes of human readers; it’s something else, something new and different. And those basic methods work only because they can make use of piles of signifieds that they do not understand or have theories about. We’re the ones with the theories and we use them to make sense of what our digital tools reveal, our digital telescopes if you will. To do that we’ve got to learn to think like sociologists, as Andrew Goldstone has remarked [2]: Basically, I think we should situate quantitative methods in DH (which are currently going under names like “digital methods,” “distant reading,” and “macroanalysis”) in the context of one of the large-scale transformations of literary study since 1970 or so, its steadily growing and now dominant concern with the relation between the cultural, the social, and the political (let’s call it the cultural turn for short, though I don’t mean to identify the transformation of literary scholarship with the roughly contemporary historiographical shift of the same name). This turn is common knowledge, but it’s kind of fun to count it out, as I tried to do in the talk. One of the major challenges of the cultural turn has been the dubious relation between the handful of aesthetically exceptional texts literary scholars have focused their energies on and the large-scale social-historical transformations which have come to be the most important interpretive contexts for those texts. Do these texts tell us, as clues or symptoms, everything we can learn about the systematic relations between society and literature, or, for that matter, about the systematic development of literature considered just as a body of texts? Haven’t we had good reason, ever since the canon debates, to doubt the coherence and comprehensiveness of the body of texts professional scholars happen to value? The cultural turn itself, then, might motivate us to search for other methods than those developed for interpreting the select body of texts. That’s one thing. 14 There’s another: When, if ever, will digital criticism approach the kinds of theory-based systems that were developed in computational linguistics back in the 1970s? That’s the kind of work I published in MLN in 1976 and on which my dissertation was based two years later [3]. I don’t have an answer to that question. But I do think that the latest pamphlet out of Stanford’s Literary Lab, On Paragraphs. Scale, Themes, and Narrative Form, is a small step in that direction [4]. ***** [1] Dennis H. Klatt. Review of the ARPA Speech Understanding Project. The Journal of the Acoustical Society of America. 62, 1345 (1977); http://dx.doi.org/10.1121/1.381666 [2] Andrew Goldstone, Social Science and Profanity at DH 2014, Andrew Goldstone’s blog, accessed 23 November 2015. URL: http://andrewgoldstone.com/blog/2014/07/26/dh-soc/ [3] Cognitive Networks and Literary Semantics. MLN 91, 1976, pp. 952-982. Download URL: https://www.academia.edu/235111/Cognitive_Networks_and_Literary_Semantics See also, William Benzon and David Hays, Computational Linguistics and the Humanist. Computers and the Humanities 10, 1976, pp. 265-274. Download URL: https://www.academia.edu/1334653/Computational_Linguistics_and_the_Humanist William Benzon. Toward a Computational Historicism: From Literary Networks to the Autonomous Aesthetic. Working Paper, May 2014. Download URL: https://www.academia.edu/7776103/Toward_a_Computational_Historicism_From_Literary_Netw orks_to_the_Autonomous_Aesthetic [4] Mark Algee-Hewitt, Ryan Heuser, Franco Moretti. On Paragraphs. Scale, Themes, and Narrative Form. Stanford Literary Lab, Pamphlet 10, October 2015. 22 pp. URL: http://litlab.stanford.edu/LiteraryLabPamphlet10.pdf 15