Digital Criticism
Comes of Age
A Working Paper
William Benzon • December 2015
Digital Criticism Comes of Age
William L. Benzon
December 2015
Abstract: There is a loose historical continuity in themes and concerns running
from the origins of “close” reading in the early 20th century through machine
translation and computational linguistics in the third quarter and “distant” reading in
the present. Distant reading is the only current form of literary criticism that is
presenting us with something new in the way that telescopes once presented
astronomers with something new. Moreover it is the only form of criticism that is
directly commensurate with the material substance of language. In the long-term it
will advance in part by recouping and reconstructing earlier work in symbolic
computation of natural language.
CONTENTS
Computation in Literary Study: The Lay of the Land ....................................................... 2
Digital Criticism @ 3QD ................................................................................................. 5
The Only Game in Town: Digital Criticism Comes of Age ............................................... 7
Distant Reading and Embracing the Other .................................................................................................... 7
Reading and Interpretation as Explanation ..................................................................................................... 8
A Mid-1970s Brush with the Other .................................................................................................................. 9
The Center is Gone ........................................................................................................................................... 10
Current Prospects: Into the Autonomous Aesthetic ................................................................................... 11
Commensurability, Meaning, and Digital Criticism ....................................................... 13
1301 Washington Street, Apartment 311
Hoboken, New Jersey 07030
[email protected]
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
1
Computation in Literary Study: The Lay of the Land
It was the best of times, it was the worst of times, it was the age of
wisdom, it was the age of foolishness, it was the epoch of belief, it was
the epoch of incredulity, it was the season of Light, it was the season of
Darkness, it was the spring of hope, it was the winter of despair, we had
everything before us, we had nothing before us, we were all going direct
to Heaven, we were all going direct the other way …
– Charles Dickens
Oh yeah I'll tell you something
I think you'll understand
When I say that something
I wanna hold your hand
– John Lennon and Paul McCartney
I have a somewhat different perspective on digital humanities than do most DH practitioners and
most humanists who know of it – and who doesn’t? – but are not practitioners. “Digital Humanities”
covers a range of practices which are quite different from one another and are united only by the fact
that they are computer intensive. A digital archive of medieval manuscripts is quite different from a
3D graphic re-creation of an ancient temple and both are different from a topic analysis of 19thcentury British novels. It is the last that most interests me, and I think of it as an example of
computational criticism.
So, in the following diagram we see that computational criticism (CC) is a subset of digital
humanities (DH):
But it is also a subset of naturalist criticism (NC). The naturalist critic treats literary phenomena as
existing in the natural world along with other phenomena such as asteroids, clouds, squid, yeast,
thunder, fireflies, and of course human beings.
I’ve been doing naturalist criticism since the mid-1970s when I studied computational
linguistics with David Hays while getting my degree in English literature [1]. Hays was in the first
generation of researchers in machine translation and coined the term “computational linguistics” in
the early 1960s. At that time, and well into the 1980s, researchers created hand-coded symbolic
models of linguistic processes: phonology, morphology, syntax, semantics, and pragmatics. In the
2
1980s, however, that research style was displaced by one that emphasized machine learning of large
data sets using statistical models. Those statistical methods are in turn the foundation of much
current work in computational criticism.
Thus, while I do not myself work with those models, there is a historical connection
between them and what I was doing four decades ago. Even though they are quite different from the
models I worked with, they are in the same intellectual milieu, one I’ve worked in for most of my
career. It’s the world of social and behavioral sciences as practiced under the aegis of computation.
It’s a world of naturalistic study.
Today’s computational critics, however, show no interest in those old symbolic models. For
one thing, they aren’t of direct use to them [2]. Moreover, since they propose that, in some way, the
human mind is itself computational in kind, they may be too hot to handle – though Willard McCarty
has been handling it for years, as I discuss in “The Only Game in Town”. In many humanistic circles
its bad enough that these researchers are using computers for more than word-processing and email,
but to seriously propose that the human mind is computational – no, that’s not worth the risk. The
horror! The horror! And yet, in the long run, I don’t think the idea can be held at bay. Humanists will
have to confront it and when they do, well, I don’t know what will happen, though I myself have
been comfortable with the idea for years.
As I was drafting this piece, I got pinged about a post that Ted Underwood just put up
pointing our increasing methodological overlap between literary history and sociology [3]. There
you’ll find paragraphs like this one:
Close reading? Well, yes, relative to what was previously possible at scale. Content
analysis was originally restricted to predefined keywords and phrases that captured
the “manifest meaning of a textual corpus” (2). Other kinds of meaning, implicit in
“complexities of phrasing” or “rhetorical forms,” had to be discarded to make text
usable as data. But according to the authors, more recent approaches to text analysis
“give us the ability to instead consider a textual corpus in its full hermeneutic
complexity,” going beyond the level of interpretation Kenneth Burke called
“semantic” to one he considered “poetic” (3-4). This may be interpretation on a
larger scale than literary scholars are accustomed to, but from the social-scientific
side of the border, it looks like a move in the direction of rhetorical complexity.
The closer you get to rhetorical complexity, the more likely that sooner or later you’re going to bump
into phenomena at a scale appropriate to those old symbolic models, not to mention the kind of
hand-crafted description of individual texts that I’ve been advocating [4].
But that’s a diversion. I was really headed toward a recent article by Yohei Igarashi, Statistical
Analysis at the Birth of Close Reading [5]. Here are some sentences from the opening paragraph (p.
485):
It may be instructive to remember, then, that close and non-close reading, far from
crossing paths for the first time recently, encountered one another in the early
twentieth century on the terrain of educational “word lists.” The genre of the word
list, a list of words usually taking word frequency as its elemental measure, shaped
close reading at its founding. In fact, the founder of close reading, I. A. Richards,
believed that one such word list, Basic English—a parsimonious but usable version
of the English language reduced to only 850 words—was the best tool for teaching
students to be better close readers.
So, at its historical inception, “close reading” shares a thematic fellow traveler with “distant reading”,
language statistics. And in an endnote Igarashi notes that “machine translation, from its advent at
midcentury to the present day, has adopted Basic English’s logics of functionality, performativity,
and universalism” (p. 501). That’s where I was with David Hays in the 1970s at Buffalo and as for
the notion of a limited vocabulary from which all else could be derived, that was certainly an
important one in a number of those models in one form or another. Why not? It’s all language, no?
3
While Igarashi’s article is historical, in recounting a particular history it is also outlining a
boundary that encompasses Ogden and Richards in the first quarter of the 20th century, machine
translation and computational linguistics in the third quarter, and “distant” reading in the first quarter
of the 21th century. That’s the world of phenomena that we’ll be investigating and reconstructing for
the foreseeable future. That’s the land before us.
*****
[1] See, e.g. my Cognitive Networks and Literary Semantics. MLN 91: 952-982, 1976. URL:
https://www.academia.edu/235111/Cognitive_Networks_and_Literary_Semantics
William Benzon and David Hays, Computational Linguistics and the Humanist. Computers and the
Humanities 10: 265 - 274, 1976. URL:
https://www.academia.edu/1334653/Computational_Linguistics_and_the_Humanist
[2] I have a working paper in which I sketch-out some connections between those older models and
these newer ones, Toward a Computational Historicism: From Literary Networks to the Autonomous Aesthetic
(2014) 27 pp. URL:
https://www.academia.edu/7776103/Toward_a_Computational_Historicism_From_Literary_Netw
orks_to_the_Autonomous_Aesthetic
[3] Ted Underwood. Emerging conversations between literary history and sociology. The Stone and
the Shell, blog post. December 3, 2015. URL: http://tedunderwood.com/2015/12/02/emergingconversations-between-literary-history-and-sociology/
[4] For example, see my most recent working paper on description where I discuss the relationship
between symbolic models from the 1970s and 1980s and the formal structure of texts, Description 3:
The Primacy of Visualization, Working Paper, October 2015. URL:
https://www.academia.edu/16835585/Description_3_The_Primacy_of_Visualization
[5] Yohei Igarashi. Statistical Analysis at the Birth of Close Reading. New Literary History, Volume 46,
Number 3, Summer 2015, pp. 485-504. DOI: 10.1353/nlh.2015.0023
4
Digital Criticism @ 3QD
Think of this as a preface to the next section, “The Only Game in Town: Digital Criticism Comes of
Age”, for that is in fact what it is. I put this on my personal blog, New Savanna, as an introduction to
a much longer post, around the corner and overt there, at 3 Quarks Daily, a group blog.
*****
I open with Moretti – natch – then to Willard McCarty’s 2013 Busa Award Lecture, where he talks of
embracing the computer as Other. I end with Said on his belief in an autonomous aesthetic realm,
despite the difficulties of conceptualizing how it could possibly work. The thrust of the article,
though, is whether or not we can actually get this venture moving, really moving. What are the
chances of really embracing the Other?
Though I made my peace with the computer years ago, and so am biased, I don’t know the
answer to that question. But I’ve made some progress in figuring out what that question entails and
that form the bulk of my essay.
The issue is one that’s been with academic literary study since the early 20th Century. In the
1920s the matter was stated most succinctly by Archibald MacLeish, that poems should not mean but
be. In that late 1950s we find ourselves in the “Polemical Introduction” to Northrup Frye’s wellknown Anatomy of Criticism (pp. 27-28):
The reading of literature should, like prayer in the Gospels, step out of the talking
world of criticism into the private and secret presence of literature. Otherwise the
reading will not be a genuine literary experience, but a mere reflection of critical
conventions, memories, and prejudices. The presence of incommunicable
experienced in center of criticism will always keep criticism as art, as long as the
critic recognizes that criticism comes out of it but cannot be built on it.
The issue came home to me in a rejection letter for my first essay on “Kubla Khan” – which ended
up going into Language and Style in 19851 – where the reviewer complained that the essay “ought to
argue with itself, to put into question some of the patterns it establishes-or better, perhaps to let the
poem talk back.”
What does he mean, “let the poem talk back”? I know very well that the statement isn’t
meant to be taken literally. But what’s the non-literal version of the statement? Under what
circumstances could a poem do something like talk back?
Under face-to-face performance circumstances. To be sure, the poem doesn’t talk, but the
poet does. The poet recites the poem, the teller spins the tale, the audience reacts with silence,
groans, laughter, remarks, and the poet replies. There the poet/story-teller and audience share the
same physical and discursive space and so CAN interact in real time. But criticism really isn’t like
that, no matter how much this or that critic wishes otherwise.
The issue remains on the table. Here’s a more recent version: Hans Adler and Sabine Gross
“Adjusting the Frame: Comments on Cognitivism and Literature,” Poetics Today 23:2 (Summer 2002,
pp. 195-220) p. 215:
Literary texts are designed to open up spaces for interpretation: different readers in
different contexts weigh elements and fill gaps in different ways that complement
the common ground of comprehension that is determined both by the text and by
shared assumptions and contextual knowledge. In a sense, the positioning of
noncognitivist and cognitivist literary studies reenacts the discussions about
predictability/determination versus subjectivity/individuality in reader-response
theory in the 1970s and 1980s: how much freedom do readers have in filling in gaps
Articulate Vision: A Structuralist Reading of “Kubla Khan”. Language and Style. Vol. 8, 1985, pp. 329. https://www.academia.edu/8155602/Articulate_Vision_A_Structuralist_Reading_of_Kubla_Khan_
5
1
and creating the actual text or interpretation; to what extent do written text and
interpretive communities (substitute here: the cognitive apparatus common to all
readers) determine individual readings?
Really? Literary texts are designed just so? By whom? I don’t believe it, nor, I suspect do Adler and
Gross. That formulation is just another trope for picturing text and critic in the same frame.
But I don’t discuss any of those examples in the 3QD essay. There I choose four texts from the
mid-1970s, texts by Geoffrey Hartman, Eugenio Donato, Jonathan Culler, and Umberto Eco. I think
of the 1970s as a turning point, the period when literary criticism turned its back on the
Computational Otherness emanating from linguistics and the nascent cognitive sciences, not to
mention humanities computing (here I’m thinking of some essays in which Stanley Fish skewers
some statistical stylistics).
What will happen this time around?
I’ll leave you with a passage from Moretti’s recent interview in Salon:2
There’s a group of students at the Stanford Literary Lab that’s studying the transformation
of meter in poetry and a group studying the transformations of suspense in narrative. Both
of them are trying to understand what makes certain forms thrive and others not. That’s
always the hardest part.
It’s also the hardest to make others interested in. I’m not sure, but I suspect that for a long
time it was much easier for an astrologer to make people interested in the skies than for an
astronomer. It was much nicer to tell stories of the stars holding the destiny of human
beings in their hands. Astronomy, with its calculations, seemed cold by comparison. Plenty
of people still care about astrology, of course, including myself. I’m a Leo, and of course I
believe in astrology [laughing]. But astronomy, just like biology and all the other sciences,
has managed to enter the culture and make a lot of people interested in a good explanation,
a bold theory, an interesting conjecture. These have become part of what millions of
people are interested in.
I would hope that literary history moves to that stage as well. That it maintains what has
been great in its past, but also becomes something that becomes interesting for
nonspecialists because it has some bold ideas about why, for example, the chorus at a
certain point disappears from tragedy. Just as we ask why dinosaurs went extinct, we
should ask why the chorus went extinct. Of course, I have a young son so I know that the
chorus is less interesting than T. Rex and Company, but still: Why did the chorus go
extinct?
2
URL: http://www.salon.com/2014/04/23/learning_from_failed_books/
6
The Only Game in Town: Digital Criticism Comes of
Age
Distant Reading and Embracing the Other
As far as I can tell, digital criticism is the only game that's producing anything really new in literary
criticism. We’ve got data mining studies that examine 1000s of texts at once. Charts and diagrams are
necessary to present results and so have become central objects of thought. And some investigators
have all but begun to ask: What IS computation, anyhow? When a died-in-the-wool humanist asks that
question, not out of romantic Luddite opposition, but in genuine interest and open-ended curiosity,
THAT's going to lead somewhere.
While humanistic computing goes back to the early 1950s when Roberto Busa convinced
IBM to fund his work on Thomas Aquinas – the Index Thomisticus3 came to the web in 2005 – literary
computing has been a backroom operation until quite recently. Franco Moretti, a professor of
comparative literature at Stanford and proprietor of its Literary Lab,4 is the most prominent
proponent of moving humanistic computing to the front office. A recent New York Times article,
Distant Reading,5 informs us
...the Lit Lab tackles literary problems by scientific means: hypothesis-testing,
computational modeling, quantitative analysis. Similar efforts are currently
proliferating under the broad rubric of “digital humanities,” but Moretti’s approach
is among the more radical. He advocates what he terms “distant reading”:
understanding literature not by studying particular texts, but by aggregating and
analyzing massive amounts of data.
Traditional literary study is confined to a small body of esteemed works, the so-called canon. Distant
reading is the only way to cover all of literature.
But Moretti has been also investigating drama, play by play, by creating diagrams depicting
relations among the characters, such as this diagram of Hamlet:
3
URL: https://en.wikipedia.org/wiki/Roberto_Busa
URL: http://litlab.stanford.edu/
5 URL: http://www.nytimes.com/2011/06/26/books/review/the-mechanic-muse-what-is-distantreading.html?amp;_r=0&adxnnl=1&;gwt=regi&pagewanted=all&adxnnlx=1398622721foXozyEphSHqZ5Lf8/ngtg
4
7
The diagram gives a very abstracted view of the play and so is “distant” in one sense. But it also
requires Moretti to attend quite closely to the play, as he sketches the diagrams himself and so must
be “close” to the play.
His most recent pamphlet, “Operationalizing”: or, the Function of Measurement in Modern Literary
Theory (December 2013, PDF6), discusses that work and concludes by observing: “Computation has
theoretical consequences—possibly, more than any other field of literary study. The time has come,
to make them explicit” (p. 9).
If such an examination is to take place the profession must, as Willard McCarty asserted in
his 2013 Busa Award Lecture (link at the end), embrace the Otherness of computing:
I want to grab on to the fear this Otherness provokes and reach through it to the
otherness of the techno-scientific tradition from which computing comes. I want to
recognize and identify this fear of Otherness, that is the uncanny, as for example,
Sigmund Freud, Stanley Cavel, and Masahiro Mori have identified it, to argue that
this Otherness is to be sought out and cultivated, not concealed, avoided, or
overcome. That it’s sharp opposition to our somnolence of mind is true friendship.
In a way it is odd that we, or at least the humanists among us, should regard the computer as Other,
for it is entirely a creature of our imagination and craft. We made it. And in our own image.
Can the profession even imagine much less embark on such a journey?
Reading and Interpretation as Explanation
To appreciate what’s at stake we need to consider how academic literary critics think of their craft.
This passage by Geoffrey Hartman, one of Yale’s so-called “Gang of Four” deconstructive critics, is
typical (The Fate of Reading, 1975, p. 271):
6
URL: http://litlab.stanford.edu/LiteraryLabPamphlet6.pdf
8
I wonder, finally, whether the very concept of reading is not in jeopardy.
Pedagogically, of course, we still respond to those who call for improved reading
skills; but to describe most semiological or structural analyses of poetry as a
"reading" extends the term almost beyond recognition.
Note first of all that “reading” doesn’t mean quite what it does in ordinary use. In standard academic
usage a written exegesis is called “a reading.”
In that passage Hartman is drawing a line in the sand. What he later calls the “modern
‘rithmatics’—semiotics, linguistics, and technical structuralism” do not qualify as reading, even in the
extended professional sense. Nor, it goes without saying, would Moretti’s “distant reading” qualify as
reading. What each of these methods does is to objectify the text and thereby “block” the critic’s
“identification” with and entry into the text’s world.
I cannot place too much stress on how foundational this sense of reading is to academic
literary criticism. For the critic, to “read” a text is to explain it. The hidden meaning thus found is
assumed to the animating force behind the text, the cause of the text. Objectification gets in the way
of the critic’s identification with the text and so displaces this “reading” process.
By the time Hartman wrote that essay, the mid-1970s, the interpretative enterprise had
become deeply problematic. We can trace the problem to the so-called New Critics, who rose to
prominence after World War II with their practice of “close” reading. These critics insisted on the
autonomy of the text. Literary texts contain their meaning within themselves and so must be
analyzed without reference to authors or socio-historical context. The text stands alone.
The problem is that “reading” presupposes some agent behind the text. How then can
interpretation proceed when authors and context have been ruled out of court? As a practical matter,
it turns out that as long as critics share common assumptions, they can afford the pretense that texts
stand alone. That’s what the New Critics and much of the profession did.
But by the 1960s critics began noticing that, hey, we don’t always agree in our readings and there’s no
obvious way to reconcile the differences. Critics began to suspect that each in her own way was reading
herself into this supposedly autonomous text. That’s when the French landed in Baltimore – at the
(in)famous structuralism conference at Johns Hopkins in 19667 – and literary criticism exploded with
ideas and controversy.
A Mid-1970s Brush with the Other
Hartman published that essay in 1975, roughly the mid-point of this process. There are other texts
from this period that illustrate how literary criticism brushed up against objectification but then
deflected it.
Eugenio Donato’s 1975 review of Lévi-Strauss’s Mythologiques is particularly important. On
the one hand, as Alan Liu has pointed out in a recent essay, “The Meaning of the Digital
Humanities” (PMLA 128, 2013, 409-423; see video below) Lévi-Strauss’s structuralist anthropology
was “a midpoint on the long modern path toward understanding the world as system” (p. 418) –
which, we’ll see later on, has brought the critical enterprise to the point of collapse. On the other,
Donato was one of the organizers of the 1966 structuralism conference.
Donato’s review appeared in Diacritics (vol. 5, no. 3, p. 2) and was entitled “Lévi-Strauss and
the Protocols of Distance.” Notice that trope of distance, which has served to indicate the
relationship of critic to text from the New Critics to Moretti. Donato tells us that he isn’t interested
in Lévi-Strauss’s accounts of specific myths, that is, his “technical structuralism” in Hartman’s
phrase. What interests Donato is that “despite Lévi-Strauss’ repeated protestations to the contrary,
the anthropologist is not completely absent from his enterprise.”
What interests Donato is the way Lévi-Strauss’s text itself is like a literary text and so can be
analyzed as one. That is typical of deconstructive criticism and of Lévi-Strauss’s reception among
7
URL: http://hub.jhu.edu/magazine/2012/fall/structuralisms-samso
9
literary critics. Beyond reference to binary oppositions few were interested in his analysis of myths or
other ethnographic materials. None were interested in abstracting and objectifying literary texts in the
way Lévi-Strauss had objectified myths through his use of diagrams, tables, and quasi-mathematical
formulas.
A year later Umberto Eco published A Theory of Semiotics (1976) in which he used a
computational model devised by Ross Quillian in 1968 and posited it as a basic semiotic model. But
Eco didn’t use any of the later and more differentiated models developed in the cognitive sciences,
nor did any other literary critics. Quillian’s model took the form of a network in which concepts were
linked to other concepts. And that’s what Eco liked, the notion of concepts being linked to other
concepts in “a process of unlimited semiosis” (p. 122). What had to be done to make such a model
actually work, that didn’t interest him.
During this same period Jonathan Culler published Structuralist Poetics (1975), where he
observed that linguistic analysis is not hermeneutic (p. 31). On this he agrees with Hartman. He also
employed some ideas from Chomskyian linguistics – itself derived from abstract computational
considerations, such as the contrast between deep structure and surface structure and the notion of
linguistic competence, which became literary competence. But Culler didn’t continue down this path;
few did.
The Center is Gone
By the turn of the millennium, however, it became increasingly clear that things were rotten in
Criticland. In one of his last essays, Globalizing Literary Study (PMLA, Vol. 116, No. 1, 2001, pp.
64-68), Edward Said notes: “An increasing number of us, I think, feel that there is something
basically unworkable or at least drastically changed about the traditional frameworks in which we
study literature“ (p. 64). He goes on (pp. 64-65):
I myself have no doubt, for instance, that an autonomous aesthetic realm exists, yet
how it exists in relation to history, politics, social structures, and the like, is really
difficult to specify. Questions and doubts about all these other relations have eroded
the formerly perdurable national and aesthetic frameworks, limits, and boundaries
almost completely. The notion neither of author, nor of work, nor of nation is as
dependable as it once was, and for that matter the role of imagination, which used
to be a central one, along with that of identity has undergone a Copernican
transformation in the common understanding of it.
Thought about that autonomous aesthetic realm presupposes stable, and ultimately traditional,
conceptions of the self, the nation, of identity, and the imagination. Given those as a vantage point,
the critic can interpret texts and thereby explore the autonomous aesthetic. Without those now
shattered concepts that autonomous aesthetic realm is but a phantasm of critical desire.
Post-structuralist criticism had dissolved things into vast networks of objects and processes
interacting across many different spatial and temporal scales, from the syllables of a haiku dropping
into a neural net through the processes of rendering ancient texts into movies made in Hollywood,
Bollywood, or “Chinawood” (Hengdian,8 in Zhejiang Province) and shown around the world.
Authors became supplanted by various systems. Lacanians revised the Freudian
unconscious. Marxists found capitalism and imperialism everywhere. And signs abounded,
bewildering networks of signs pointing to signs pointing to signs in enormous tangles of selfreferential meaning. Identity criticism stepped into the breech opened by deconstruction. Feminists,
African-Americans, gays and lesbians, Native Americans, Latinos, subaltern post-colonialists and
others spoke from their histories, reading their texts against and into the canon.
If it has become so difficult to gain conceptual purchase on the autonomous aesthetic realm,
then perhaps we need new conceptual tools. Computation provides tools that allow us to examine
large bodies of texts in new ways, ways we are only beginning to utilize. Computation gives us ways
8
URL: http://en.wikipedia.org/wiki/Hengdian_World_Studio
10
to think about systems on many scales, about how they operate in a coherent way, for fail to do so. It
gives us tools to examine such systems, but also to simulate them.
Current Prospects: Into the Autonomous Aesthetic
And so we arrive back at Moretti’s call for an exploration of the implications of computing for
literary criticism. That the profession has no other viable way forward does not imply that it will
continue the investigation. It could after all choose to languish in the past.
What reason do we have to believe that it will choose to move forward?
The world has changed since the 1970s. On the one hand, the critical approaches that were
new and exciting back then have collapsed, as Said has indicated. On the other hand, back then the
computer was still a distant and foreign object to most people. That’s now changed. Powerful
computers are ubiquitous. Everyone has at least one, if not several – smartphones, remember, are
powerful computers.
Even more importantly, we now have a cohort of literary investigators who grew up with
computers and for whom objectified access to texts is familiar and comfortable. Geoffrey Hartman
was worried about getting, and staying, “close” to the text. But Moretti talks of “distant” reading and
so do others. Can it be long before critics figure out that distance is really objectification and that it’s
not evil?
We also have a generation of younger scholars who draw on cognitive science and
evolutionary psychology, disciplines that didn’t exist when the French landed in Baltimore in 1966.
Computation played a direct role in cognitive science by serving as a model for the implementation
of mind in matter. And, via evolutionary game theory, computation is the intellectual driver behind
evolutionary psychology as well.
Finally, over the last couple of years various digital humanists have sensed that the field lacks
a substantial body of theory. And there is a dawning awareness that this theory doesn’t have to be
some variation on post-structuralist formations trailing back into the 1960s. Perhaps we need some
new kinds of theory.
In a recent article, Who You Calling Untheoretical? (Journal of Digital Humanities Vol. 1, No.
1 Winter 20119) Jean Bauer notes that the database is the theory:
When we create these systems we bring our theoretical understandings to bear on
our digital projects including (but not limited to) decisions about: controlled
vocabulary (or the lack thereof), search algorithms, interface design, color palettes,
and data structure.
Finally, in order use the computer as a model for thought one doesn’t have to adopt the view,
mistaken in my opinion, that brains are computers. Whatever its limitations, the computer, as idea, as
abstract model, is the most explicit model we have for the mind. Or, to be more precise, it is the most
explicit model we have for how a mind might be embodied in matter, such as a brain.
That’s a way forward, a rich theoretical opportunity. But it’s not a matter of latching on to a
handful of ideas. It’s a matter of long-term investigation into territory that is as yet unexplored.
For the nature of computing itself is still fraught with mystery. It is up to us to redefine it for
the theoretical investigation and study of the humanities.
*****
9
URL: http://journalofdigitalhumanities.org/1-1/who-you-calling-untheoretical-by-jean-bauer/
11
DH2013 Busa Award Lecture by Willard McCarty:
http://www.youtube.com/embed/nTHa1rDR680
Meaning of the Digital Humanities - Alan Liu:
http://www.youtube.com/embed/IrvUys_STcs
May 5, 2014
12
Commensurability, Meaning, and Digital Criticism
What do I mean by “commensurate”? Well…psychoanalytic theory is not commensurate with
language. Neither is semiotics. Nor is deconstruction. But digital criticism is, sorta. Cognitive criticism
as currently practiced is not commensurate with language either.
Let me explain.
In the early 1950s the United States Department of Defense decided to sponsor research in
machine translation; they wanted to use computers to translate technical documents in Russian into
English. The initial idea/hope is that this would be a fairly straightforward process. You take a
sentence in the source language, Russian, identify the appropriate English words for each word in the
source text, and then add proper English syntax and voilà! your Russian sentence is translated into
English.
Alas, it’s not so simple. But researchers kept plugging away at it until the mid-1960s when,
tired of waiting for practical results, the government pulled the plug on funding. That was the end of
that.
Almost. The field renamed itself and became computational linguistics and continued
research, making slow but steady progress. By the middle of the 1970s government funding began
picking up and the DoD sponsored an ambitious project in speech understanding. The goal of the
project was for the computer to understand “over 90% of a set of naturally spoken sentences
composed from a 1000‐word lexicon” [1]. As I recall – I read technical reports from the project as
they were issued – the system was hooked to a database of information about warships. So that 1000word lexicon was about warships. Those spoken sentences were in the for of questions and the
system demonstrated its understanding by producing a reasonable answer to the question.
The knowledge embodied in those systems – four research groups worked on the project for
five years – is commensurate with language in the perhaps peculiar sense that I’ve got in mind. In
order for those systems to answer questions about naval ships they had to be able to parse speech
sounds into phonemes and morphemes, identify the syntactic relations between those morphemes,
map the result into lexical semantics and from there hook into the database. And then the process
had to run in reverse to produce an answer. To be sure, a 1000 word vocabulary in a strictly limited
domain is a severe restriction. But without that restriction, the systems couldn’t function at all.
These days, of course, we have systems with much more impressive performance. IBM’s
Watson is one example; Apple’s Siri is another. But let’s set those aside for the moment, for they’re
based on a somewhat different technology than that used in those old systems from the Jurassic era
of computational linguistics (aka natural language processing).
Those old systems were based on explicit theories about how the ear decoded speech
sounds, how syntax worked, and semantics too. Taken together those theories supported a system
that could take natural language as input and produce appropriate output without any human
intervention between the input and the output. You can’t do that with psychoanalysis, semiotics,
deconstruction, or any other theory or methodology employed by literary critics in the interpretation
of texts. It’s in that perhaps peculiar sense that the theories with which we approach our work are
not commensurate with the raw material, language, of the objects we study, literary texts.
About all we can say about the process through which meaning is conveyed between people
by a system of physical signs is that it’s complex, we don’t understand it, and it is not always 100%
reliable. The people who designed those old systems have a lot more to say about that process, even
if all that knowledge has a somewhat limited range of application. They know something about
language that we don’t. The fact that what they know isn’t adequate to the problems we face in
13
examining literary texts should not overrun the fact that they really do know something about
language and mind that we don’t.
Now, what about Siri and Watson? The type of research employed in those old speech
understanding systems went on for about a decade and was replaced by a somewhat different
methodology. This methodology dispensed with those explicit theories and instead employed
sophisticated statistical techniques and powerful computers, operating on large data sets. These
statistically based learning approaches first appeared in speech recognition and optical character
recognition (OCR). The goal in these systems is simply to recognize the input in computer-readable
terms. There’s no attempt at understanding, translating into another language, or answering
questions. That’s come later.
The big task these days is to combine the two approaches, hand-coded theory-based
knowledge, with statistical learning, into a single system. But that’s not where I’m going with this
post. Where I’m going is that the statistical techniques employed in digital criticism are of a piece
(and often the same as) the statistical techniques employed in OCR, speech recognition, and more
ambitious systems such as Siri and Watson. The larger point is simply that digital criticism is, in the
sense I’m employing the term, commensurate with language in a way that conventional criticism is
not.
Digital criticism starts with the raw signifiers and that’s it. By analyzing large highly
structured collections of raw signifiers (that is, collections of texts) these methods produce
descriptions of those collections that give us (that is, human critics) clues about what’s going on in
those texts. As far as I can tell, those clues could not be produced in any other way. It’s not as
though digital critics are doing things that would be done better with an army of critics that we don’t
have. Even if we had that army reading all those texts, how would they express their understanding
of what they read? How could they aggregate their results? No, digital criticism is not a poor
substitute for hordes of human readers; it’s something else, something new and different.
And those basic methods work only because they can make use of piles of signifieds that
they do not understand or have theories about. We’re the ones with the theories and we use them to
make sense of what our digital tools reveal, our digital telescopes if you will. To do that we’ve got to
learn to think like sociologists, as Andrew Goldstone has remarked [2]:
Basically, I think we should situate quantitative methods in DH (which are currently
going under names like “digital methods,” “distant reading,” and “macroanalysis”)
in the context of one of the large-scale transformations of literary study since 1970
or so, its steadily growing and now dominant concern with the relation between the
cultural, the social, and the political (let’s call it the cultural turn for short, though I
don’t mean to identify the transformation of literary scholarship with the roughly
contemporary historiographical shift of the same name). This turn is common
knowledge, but it’s kind of fun to count it out, as I tried to do in the talk.
One of the major challenges of the cultural turn has been the dubious relation
between the handful of aesthetically exceptional texts literary scholars have focused
their energies on and the large-scale social-historical transformations which have
come to be the most important interpretive contexts for those texts. Do these texts
tell us, as clues or symptoms, everything we can learn about the systematic relations
between society and literature, or, for that matter, about the systematic development
of literature considered just as a body of texts? Haven’t we had good reason, ever
since the canon debates, to doubt the coherence and comprehensiveness of the
body of texts professional scholars happen to value? The cultural turn itself, then,
might motivate us to search for other methods than those developed for
interpreting the select body of texts.
That’s one thing.
14
There’s another: When, if ever, will digital criticism approach the kinds of theory-based
systems that were developed in computational linguistics back in the 1970s? That’s the kind of work I
published in MLN in 1976 and on which my dissertation was based two years later [3]. I don’t have
an answer to that question. But I do think that the latest pamphlet out of Stanford’s Literary Lab, On
Paragraphs. Scale, Themes, and Narrative Form, is a small step in that direction [4].
*****
[1] Dennis H. Klatt. Review of the ARPA Speech Understanding Project. The Journal of the Acoustical
Society of America. 62, 1345 (1977); http://dx.doi.org/10.1121/1.381666
[2] Andrew Goldstone, Social Science and Profanity at DH 2014, Andrew Goldstone’s blog, accessed
23 November 2015. URL: http://andrewgoldstone.com/blog/2014/07/26/dh-soc/
[3] Cognitive Networks and Literary Semantics. MLN 91, 1976, pp. 952-982. Download URL:
https://www.academia.edu/235111/Cognitive_Networks_and_Literary_Semantics
See also, William Benzon and David Hays, Computational Linguistics and the Humanist. Computers
and the Humanities 10, 1976, pp. 265-274. Download URL:
https://www.academia.edu/1334653/Computational_Linguistics_and_the_Humanist
William Benzon. Toward a Computational Historicism: From Literary Networks to the Autonomous Aesthetic.
Working Paper, May 2014. Download URL:
https://www.academia.edu/7776103/Toward_a_Computational_Historicism_From_Literary_Netw
orks_to_the_Autonomous_Aesthetic
[4] Mark Algee-Hewitt, Ryan Heuser, Franco Moretti. On Paragraphs. Scale, Themes, and Narrative Form.
Stanford Literary Lab, Pamphlet 10, October 2015. 22 pp. URL:
http://litlab.stanford.edu/LiteraryLabPamphlet10.pdf
15