reality bytes

[Today, I gave an opening plenary talk at the 53rd Annual RBMS preconference in San Diego. RBMS is a conference for people professionally interested in rare books and manuscripts. Here’s the text. But first—I want to make clear that the views it expresses are mine alone. They may not reflect those of my co-workers at the University of Virginia, and my employers had no prior knowledge that I’d be giving such a talk. I didn’t have much warning, myself. I re-wrote it late into the night on Monday, before joining (for a couple of hours, anyway) the crowd in the dark outside our beautiful Rotunda—a night documented here.]

At the University of Virginia Library, we begin our regular directors’ meetings with a round of “hot topics”—a chance to make pressing announcements or insert late-breaking news into the agenda for the day. Now, readers of such obscure periodicals as the New York Times, the Washington Post, and the Chronicle of Higher Education may have noticed that UVa is having… kind of a rough week. So when my colleagues and I gathered most recently, I had a fairly good guess at what our meeting’s “hot topic” might be. Instead, the first hand to be raised was that of our Director of Facilities Management, who made an earnest and concerned report: at least—two rats!—had been sighted!—in the grass outside, not terribly far from our wonderful Special Collections Library.

The question, my friends, was obvious. Were these rats coming—or going?

When I sat down to draft this morning’s presentation, I found it very difficult to disentangle what I had intended to say to you, from what I felt newly compelled to say. I had my title. As a more physical-collections-focused companion piece to Matt Kirschenbaum’s “Bit by Bit,” how could this talk not be called, “Reality Bytes?” But I meant, at first, for it to have a narrower scope: to be purely about the shape and trajectory of the most bookish side of what has come to be called the digital humanities. I’d discuss how rapidly-advancing analytical and presentational technology might impact our thinking about bibliographical research, paleography, and special collections librarianship. Just as Matt would cover the born-digital archive, I had planned to talk about new opportunities to be found in the changing relationship of scholars and students and humanities software developers to their historical, paper-based archives and research collections.

I was going to razzle-dazzle you with demos and slides. I threw them out.

Last week, the president of my public university was unexpectedly and unceremoniously sacked, by a board of politically-appointed businessmen and women, none of whom have significant experience in higher education, still less in the public humanities or in cultural heritage. Despite respectful demands for greater transparency and concerted protests by faculty governing bodies, staff councils, and student leadership at UVa, explanations for the ouster remained vague, citing only the fast pace of technological change and the explosion of freely-accessible digital content and low-cost online learning. These were said to result—for a great university founded and designed by Thomas Jefferson—in an “existential threat.” In other words, the explanation for this move framed democratizing technologies of access as a threat to the very existence of an institution of cultural memory, research, and higher learning.

I’ll get at the access part in a minute. But first, you know what? Existential threats don’t scare us. We’re librarians.

They’re what we mitigate and ward against every day of the week, from the micro- to the macro-scale. We pay protective attention to individual books and manuscripts—whose continued existence is ensured through careful conservation and restorative work in preservation laboratories. We pay protective attention to our charges at the collections level, where we cultivate and nurture whole sets of like and disparate objects through principles of coherence. For some collections, we strive toward well-researched, acquisitive completeness. In other areas of the library, collections are gardens that grow hearty as we sow and weed—and the challenge is less to the physical well-being of the material we steward, than to our capacity to provide access to hybrid print-and-digital collections of modern scholarship: that is, secondary articles and monographs. This capacity often comes down to our ability and our willingness to afford the monopolistic prices set on these resources—and to speak up for ourselves and understand the agency (and sometimes the complicity) of libraries in this moment of great transition. It is a moment Jerome McGann has called the shutting down of our “operating system” of print-based scholarly communication.

Those of us who work with rare books and manuscripts are more buffered from utter change than many of our colleagues on the teaching faculty and elsewhere in the library. It’s tempting and certainly comforting in times of upheaval to see ourselves as the small, still center of the academy, where a kind of monasticism endures. But we are no less responsible (and I would argue, given our particular expertise, considerably more so) to understand and engage with the digital scholarly revolution, and to help contextualize and channel its energies. McGann writes, in the most recent issue of Profession, that “book culture will not go extinct: human memory is too closely bound to it. But no one any longer thinks that scholarship, our ongoing research and professional communication, can be organized and sustained through print resources.” What’s the place, then, of book culture in a world so differently-organized? If you don’t have an answer for that, who will?

Merrilee, our moderator, in her wisdom, suggested that I not skimp on time devoted to a fundamental question: “What are the digital humanities, and why are libraries investing in them?” Now, there are as many definitions of DH as there are digital humanities practitioners. But the very fact that I could not say, “…as there are digital humanities scholars,” is a key avenue into the question, “What is DH?”

The digital humanities are not generally seen as a discipline, but as an inter-discipline. They are less commonly codified into an academic department; instead, most maintain an interdepartmental place in the university setting. DH constitutes a broad and inclusive community of practice, made up of student and faculty researchers from every traditional humanities discipline and from several sub-disciplines that have only begun to come into their own under the banner of the digital. But more: it is a social as well as scholarly community—a community that includes: librarians and archivists; museum workers, curators, and collectors; performers, publishers, and public servants; computer programmers and systems administrators; software developers and designers; and interested faculty and staff from fields outside the humanities, like computer and information science, or even the bio-sciences, keen to test their research methods and techniques against the cultural record. It’s also a refreshingly welcoming and egalitarian and deeply, sweetly sincere community. To participate, you have only to show up, and to share what you’re tinkering with.

You may sometimes hear a “this is brand-spanking-new” rhetoric surrounding the digital humanities. That comes most often not from people engaged with DH in practice, but rather wanting to inject some well-meaning boosterism into the conversation, or dampen it with warnings against newfangledness and Philistinism. If you’re involved in DH, you can safely ignore any “get off my lawn!” type of pronouncement, in favor of just buckling down and producing good work. (No adventure was ever begun by obeying that directive.) If, however, you hear the first kind of pronouncement—that digital humanities revealed its wondrousness when it sprang, sui generis, as “the next big thing,” from the head of the 2009 MLA convention—well, you might pause to educate.

It’s true that we’ve been talking about “digital humanities” for less than a decade. But when I was acquiring my graduate training in the field, in the late 1990s, it was already well-established and highly international in character—and it was decidedly termed “humanities computing.” The name shift came about, in part, with the publication of a major 2004 anthology, the Blackwell’s Companion to Digital Humanities. That book’s title was chosen to indicate the breadth of the field it covered, and to position humanities computing as being about more than “mere digitization.” And it was not long, in this country, before the NEH formalized its strong support for the digital humanities by establishing an office bearing the name. (And when there’s funding for a thing, that’s what we’re going to call the thing.)

But we can go back farther. The oldest professional societies in DH, the international Association for Computers and the Humanities (which I represent as president), and Europe’s Association for Literary and Linguistic Computing (of which I am a member), are 34 and 38 years old, respectively. I am also 38 years old. So even the professionalization of DH is no spring chicken. But the practice itself, of applying computational techniques to the study of texts, images, human artifacts and cultural objects, dates back much further. The project most widely-accepted as seminal to humanities computing came about in the era of punch-cards—in a conversation between an academic (the Jesuit Fr. Roberto Busa) and a tech pioneer (Thomas Watson, the founder of IBM). Busa’s computer-assisted lemmatization of the work of St. Thomas Aquinas was begun in 1946. This makes the digital humanities exactly as old as my father. He recently retired and can be found this week… at Disneyland.

It was in the early to mid-1990s that libraries in the United States began to foster, create, and house digital humanities centers. Almost every now-great DH lab or center was developed in some formal relationship to a library, museum, or archive. And it’s no surprise—the first and still predominant strain of North American digital humanities scholarship stems from work in bibliography and textual criticism, with the creation of great editorial projects and the digitization and assembly of large-scale thematic research collections. I speak of the Rossetti Archive, the Whitman Archive, the Blake Archive, and their many descendants. As DH has exploded into areas like new media studies, geographical information science, statistical analysis, augmented reality, and information visualization and 3d-modeling, libraries are right there, too—often providing the digitized historical or born-digital content on which scholars operate, sometimes setting research agendas of their own, and always sharing space, technology resources, and staff expertise. Libraries provide the crucial social and technological infrastructure for digital humanities research, and—as the long-established commons and shared laboratory for the humanities—are primary sites in which DH community is enacted and its discoveries are made.

Sure. Whatever. Sounds great. What happens there with special collections content? At the most basic level, we see libraries creating the equivalent of straightforward snapshots of various kinds—the simplest digital surrogates for their print and manuscript materials, accompanied by a minimal amount of metadata. Midway in technical complexity, and generally directed by or undertaken in collaboration with scholars, are those new-model, interactive editions and research collections, where the intellectual content of the physical archive (or perhaps of multiple archives) has been explicitly shaped by the interpretive knowledge-representation methods and structures through which it was encoded. Here, you might think of TEI-marked texts with annotations and scholarly apparatus, packaged with purpose-built stylesheets for rendering to a variety of display devices. Of course no transformations—even the simple snapshots I mentioned—are naïve transformations: we know that each bears traces of its makers’ hands—but digital scholarly editions foreground editorial intervention even when they are most diplomatic.

The third category of digitized content created by libraries is comprised of the most heavily re-mediated material—books or papers or images or objects that have been completely re-formatted—made into a new kind of data, modeled and delivered in a markedly divergent way from the information design of the objects in which those data were first embodied. Consider an antiquarian map that has been scanned, cropped, geo-referenced (or rubber-sheeted to fit a modern street grid), and is now offered as a formal web service providing tiled raster (or even vectorized) imagery for analysis against numerical data sources in a GIS, a geographical information system. This is no longer really an historical map. Still less (a design problem yet unsolved) is it evidently a leaf from an atlas. But neither is it a diminishment of those forms. It’s just something different: rich and strange.

We can attend to it differently.

The protective attention that we, who care about the physical forms of the book, pay to individual objects and to carefully curated groups of objects must extend to our digitization practices. Just as we maintain climate-controlled stacks for our precious physical collections, we create dark archives for their digital counterparts—rarely-opened repositories geared toward long-term safe-keeping.

Dark archives swallow files up and through acts of faith and forensics we anticipate their Second Coming. But I want to argue that it’s our greater responsibility to fill our dark archives with light—right now. We must develop digitization standards and best practices with an eye not just toward archival integrity and long-term preservation, but toward the provision of persistent, ready access for our users—the continual migration of digital and digitized works to new discovery and delivery platforms and interfaces. We’re smartest when we ready these objects, not only for long-term use, but always for near-term use—for use today, tomorrow, and for the sequence of tomorrows that stretch out from there. This is because—although digital and digitized content may be equally rare and frequently as unique as anything in our cool, closed stacks—most of it is unlike traditional special collections material in one important regard. Are you ready? If you remember nothing else I say here (and please forget the rats) remember this.

In contrast to physical documents and artifacts, where the best-preserved specimens are the ones that time and good housekeeping forgot, the more a digital object is handled and manipulated and shared and even kicked around, the longer it will endure. The harder they work, the longer they last. Poor Richard’s Almanack lends us a metaphor to write in indelible ink. When it comes to persistent access to digital collections, “Sloth, like rust, consumes faster than labor wears, while the used key is always bright.”

Therefore, we—and I speak here about the community of bibliographically-minded scholars and librarians and archivists and digital humanists working in concert—we seek to remediate fragile or otherwise inaccessible book and manuscript objects into digital forms that are meant to be used—and used (maybe, a little reductively) in two ways, which I’ll discuss. But first: we do this because public access is tactical preservation.

One mode of digitization often happens en masse, and is meant to scale up to very large—and, importantly, combined or federated—collections. Its goal is to enable the kind of text-mining, serendipitous browsing and targeted search, linking of open data, visualization, topic modeling, and other forms of analysis that scholars wish to perform across vast numbers of texts or images at once. We call this “distant reading.”

You can think of distant reading as the near-opposite of close reading, or attention to one text at a time—although it’s important to note that it often serves as a lens through which we can identify objects that merit close and individual attention. Still, in mass digitization (whether or not the work is conducted by an advertising corporation like Google), the intellectual content of a collection more or less completely trumps its materiality and any attempt to capture that materiality with precision or (dare I say) love.

The creation of such “big data” is often, and often rightly, lamented as a lossy process. Lossy-ness (besides being the re-nouning of an adjective that probably should have stayed a noun in the first place) is a descriptor of any digital conversion that results, in the transfer from one format to another, in the often-imperceptible falling-away of little bits of information. We engage in a game of loss when we save a family photo at a slightly lower resolution, or higher degree of compression, in order to make it easier to email to Aunt Matilda. Obviously, the primary act of digitizing special collections content is lossy in a far grosser way than the conversion of an archival TIFF image to a delivery-sized JPEG. We have no lesser authorities than Walter Benjamin and Tom Tanselle to inform us that a digital surrogate is never a full substitute, and indeed no-one claims it can be. But it’s important to acknowledge the startling scholarly gains we might make through re-mediation—even when it’s blunt.

For instance, library staff and graduate students participating in the Praxis Program at our Scholars’ Lab back home have designed a tool called Prism. It’s an instrument for what we’ve termed “crowdsourcing interpretation”—the notion that, if a printed text can first be digitized and then engineered to capture the traces of many readers’ engagement with it, we can begin not only to visualize and better understand the interplay of readers with that text—an interesting problem in itself—but we can also apply computational linguistic techniques like sentiment analysis to the crowd’s assessments and interpretations of the text at the phrase-level. This might allow us to lift out similarly-structured, or similar-feeling passages from mass-digitized corpora—the Hathi Trust, or Google Books—perhaps leading us to discover a neglected work of literature or an un-examined historical text. So, you see how we can move from the micro- to the macro-scale, and back again.

The other mode of digital re-mediation—the one closest to my heart—stays micro, and may feel especially familiar to those of us who concentrate on rare materials. Here, we seek ever more sustainable ways to digitize unique objects—less as a set of uniform “big data” points to be mined, and more with deep respect to the particularity of those objects: to their individual contours, descriptors, variants, and traces of use. I noticed that just this weekend, at a digital humanities un-conference at George Mason University, a group of participants led by Sarah Werner of the Folger and Suzanne Fischer of the Henry Ford christened this closely interpretive—and in many ways entirely traditional—approach #smalldata. There was another session proposal there, too, by Trevor Owens of the Library of Congress, on digital “thinginess.” (Gods bless the geeks of THATCamp!) These are sentiments entirely aligned with a project we are launching in just two short weeks at the Scholars’ Lab: a framework for geo-temporal interpretation of archival collections. It’s a tool for placing documents in time and space, with an emphasis not on algorithmically-generated maps and data-driven info-viz, but rather on the hand-craftedness of scholarly or curatorial interpretation. It’s called Neatline. Guys, you can draw on maps! (And do other things, but that part’s ~~transgressive~~ fun.)

Now, we can certainly practice a style of bibliographic description and analysis, or close, material reading, or thing-y small-data interpretation, on objects that have been digitized by someone else, or which are born digital. (And Matt will be discussing that shortly.) But we might pause to consider the goal of a small-data approach to active digitization and digital scholarly editing—in practice as well as in products. What would this work feel like? Be like? I’d suggest it would be geared less toward conscious transformation of forms, nor toward the mindful archiving of our source materials, but more toward capturing and conveying as much of their physicality as possible, with an aim to transmit and share expressions of that physicality in ways that aid analysis—and that are frankly celebratory. I want more digital humanities R&D on textual materiality!

We see this, for example, in the sensitive transcription and markup and the startling spectroscopy that has recently been done on the Archimedes Palimpsest. And we’ll see more of it in materials libraries and fab labs. It’s expensive (for now). It’s time-consuming (probably forever). It requires a level of devotion verging on obsession.

Let us hope it is the beginning of a trend.

Our digital facsimiles are pale surrogates at present, but we’re on the brink of amazing advances. On-demand 3d printing, or the fabrication of modeled historical artifacts. Augmented reality interfaces like pop-up books. Tactile touch-screens with textures that extrude from the glass and respond both to your fingers and to the images underneath. None of this is science fiction. All of it is consumer-market stuff, either here now, coming more or less immediately to a library near you.

That last part (but only the last part) may be a little too sanguine. These advances (which are coming) will demand your active engagement if they are to enter special collections libraries in a meaningful way and enable, for others, the kind of attention to small and wonderful things that many of you have paid throughout your careers.

Big data and small-data digitization. Both of these modes of critical remediation of rare and unique content align with what—we hope—will remain the larger missions of our institutions. Through them, we can position librarians and archivists to enable and co-create, with scholars, new research applications for old books. Through them, we also protect and elevate the status and visibility of our collections in such a way as to promote their continued centrality to the public humanities, arts, and sciences. In other words, what I earlier called “protective attention” is certainly attention to digital and physical preservation, lest we lose the things we hold most dear. But most of all it is attention to keeping our keys bright: to access and to use of digitized rare materials (the theme of this plenary session)—at multiple, simultaneous scales, and both immediately and far, far into the future.

Against librarianship lies oblivion. And, to loop back to the current unpleasantness at Virginia, I want to say: Dear Board of Visitors, don’t tell us about “existential threats.” Let us tell you. (And let us tell you how we ward against and overcome them, every day.)

So you see that I found I wanted to offer, this morning at RBMS, more than a sense of what is already happening in humanities computing and what is made possible by digital methods and techniques. I tried to do that, in capsule form. But more: I wanted to frame these possibilities in terms of our collective obligation—scholars and librarians and archivists and administrators together—to be stewards, not only of the documentary record at the moment of its most fundamental phase change in half a millennium, but of the broad and noble mission of higher education, and of the continuing place of the material traces of our cultural inheritance within it.

Engagement with a single human artifact, in the palm of your hand, is the fundamental act of humanities scholarship. If the digital age is an age of abundance–let us teach attentiveness. If, as I strongly believe, the library is to become a laboratory and a maker-space for the humanities, let’s put on our lab-coats. Let’s head back to shop class.

Digital humanities can be forward-looking only by looking back. The extent to which we can have an effective prospect on the future depends on our continued ability to do retrospective work. And this means not only preserving our collections and thinking carefully about the ways that we re-mediate them, but it also means understanding what it is to make and build and transmit and share. What, in fact, it means to transmit knowledge by making and building.

What kind of cyber-infrastructure do you build when you know and understand textual transmission and the history of the book? What kind of annotative and graphetic interfaces do you make when you have a deep and nuanced understanding of paleography in a world that barely puts pen to paper anymore? Or to turn that question on its head (or over to the Dark Side)—what kinds of machinery do you make when you do not?

We make things because that’s how we understand. We make things because that’s how we pass them on, and because everything we have was passed on to us as a made object. We make things in digital humanities because that’s how we interpret and conserve our inheritance. Because that’s how we can make it all anew.

This is a week when I’ve seen a lot of stuff torn down and taken away. It’s also a week in which I’ve felt very grateful to work in a set of allied fields (textual scholarship and librarianship and the digital humanities) that operate by making crucial things accessible, by paying careful and protective attention to what might otherwise be lost, and by building and building it all, over and over again, anew.