open and shut

I recently collaborated on a project a little outside the ordinary for me: a case study for a chapter in a forthcoming textbook for, well, cops and spooks. (Cue performative outrage and sub-tweeting about the digital humanities’ complicity in our modern surveillance state–which I will address in a moment.) The book is the infelicitously-titled Application of Big Data for National Security: A Practitioner’s Guide to Emerging Technologies, edited by Babak Akhgar et al. These are circles alien to me, but in which my chapter’s co-author, Gregory Saathoff, frequently moves.

I write about the project here for two reasons–seemingly different, but in fact closely aligned. The first is that I successfully and quite easily negotiated alterations to my author’s contract with Elsevier (my own little valentine) that made it possible for me to reconcile placing the chapter in a Butterworth-Heinemann book with my deeply-held open access values. (I remain, in terms of journal publishing, a Cost of Knowledge signatory, pledging not to publish in or contribute editing and reviewing time to Elsevier journals until their business practices become less damaging to academic libraries and the public good.) I thought it might be helpful for others to know how I undertook this negotiation, and why open access publishing is usually even easier for me. The other reason for this post has to do with the content and message of the book chapter, and its relation to recent debates in the digital humanities. This, too, relates to problems of openness, audience, and the public impact of humanities scholarship.

First, on scholarly publishing. I wanted this piece–like the rest of my writing–to be reasonably open, not shut: that is, to be available in some format to all interested readers, regardless of their ability to pay. I typically accomplish this by posting, even before I have placed a given essay or article, a version of it on my blog, which is governed by the minimally-restrictive CC-BY license I’ve described elsewhere. (In short: a Creative Commons Attribution license allows non-commercial or commercial re-use of my writing, in whole or in part, asking only that I be credited for the work.)

To be perfectly honest, most of the time I post such things–transcripts of talks I’ve given, cranky little rants like this one–without the slightest intention to publish them elsewhere, and am later invited, because I have made them freely available online, to place them in journals and edited collections. Those cleaned-up versions invariably wend their way into Libra, UVa’s institutional repository. In the present situation, because our “big data” case study was a co-authored work, and because I was brought onto the project after my collaborator had already agreed to contribute it as a textbook chapter, it didn’t occur to me to pre-emptively publish a version online. I foggily assumed that Elsevier’s increasingly enlightened journal policies, which do allow for posting of pre-prints, would apply to book chapters as well.

Last week, shortly before the book went to press, I received what I considered to be a ridiculous author’s contract. This document not only asked me to give up copyright to the piece, but prohibited deposit of a draft or pre-print in our IR and effectively asked me to sign away my own fair use rights to the content. If I signed this contract, agreeing (for instance) not to distribute copies of more than 10% of any version of the piece, even to my own students without asking that they purchase access, I would–in my I-am-not-a-lawyer view–have fewer defensible rights to use the work in my teaching than would a random colleague in a classroom next door.

I came back to Elsevier with a slightly modified version of the SPARC addendum, to which I added some solid language from Case Western University’s library. After a bit of expected pushback and negotiation, aided by a personal phone conversation with the book’s helpful editorial project manager, in which I could answer questions, do a little evangelizing, and convey what was most important to me (the striking of the 10% rule and the ability to deposit a pre-print without embargo or password-protection), we came to an agreement. I signed a modified version of the Elsevier contract, dropped a draft in the repo, and collegially shared the publisher’s feedback on language used in Libra with our scholarly communications librarian.

This is the most energy I’ve ever expended futzing with publication contracts — although I did once benefit from someone else’s futzing. All told, I probably spent a grand total of 90 minutes making sure that an open access version of the case study was available to the world and preserved in perpetuity. One caveat: as a non-tenure-track faculty librarian, my work is not typically evaluated in terms of journal impact factors–so this is not necessarily blanket advice I’d give to a junior scholar seeking a conventional career in a conservative department or discipline. However (speaking of complicity), I will point out that systems of academic evaluation are systems within the academy’s control.

Now, a few words about the case study itself: how it came about, why I anticipate that this work may be ill-received in some quarters, and why I thought it important to undertake.

Besides being a close colleague from faculty governance at UVa (and the person to whom I recently handed over the General Faculty Council chair’s gavel), my co-author is a forensic psychiatrist on our research faculty, and a consultant for entities like the FBI and, on matters of prisoner mental health and radicalization, the Federal Bureau of Prisons. Greg Saathoff directs a think-tank based in UVa’s School of Medicine, the Critical Incident Analysis Group. CIAG focuses on the societal impact of moments of (mostly mass) violence: acts of terrorism, foreign and domestic, and other arresting, convulsive, heartbreaking happenings with potentially destabilizing implications for democracy and everyday freedoms. (Think–if you can bear to; I have a hard time–of school shootings and the like.) CIAG is a multi-disciplinary consultancy and convener of conversations, whose projects aim to “distill current knowledge, providing an opportunity to identify and build productive networks and policies that enhance resilience without diminishing our liberties.”

Five years ago, Greg chaired the formal Expert Behavioral Analysis Panel (EBAP) commissioned by the government to review psychological and other records pertaining to the primary suspect in the “Amerithrax” anthrax mailings, an event that briefly paralyzed the US government in the wake of the September 11th attacks of 2001. These mailings of envelopes of deadly spores to media outlets and politicians–causing five deaths and at least seventeen non-fatal cases of inhalational anthrax–are now commonly thought to have been a result of “insider threat.” Insider threat is whistle-blowing’s Mr. Hyde: the use, by someone with classified knowledge or access to dangerous materials, of that knowledge and access to spread darkness rather than light. But it was a complex and highly controversial case, not at all (to stay with our theme) open-and-shut. The man ultimately determined by the Department of Justice to have been the mailings’ perpetrator (Dr. Bruce Ivins, who worked at a USAMRIID lab to develop anthrax vaccines sometimes cited as a contributor to Gulf War Syndrome) committed suicide while under suspicion. The investigation itself has been subject to repeated, thorough and costly scientific reviews. When my colleague was asked to contribute an EBAP-related case study for the forthcoming law enforcement textbook–reflecting on how contemporary, rapidly-expanding possibilities for “big data” analysis might have changed the work of a panel like the one he had convened–he came to me.

I was as surprised as you are. I am not a text-miner by trade. I have written more about the continuing value of hand-crafted, “small data” in our digital age, than about computational analysis of massive data sets. And, of course, I recognized that this is stuff a goodly number of my colleagues in the liberal arts would see as sullying, just to touch. I’m certain of this not only because I keep my ears open to academic discourse about the digital humanities, but because I have twice been urged–with no apparent sense of irony–to alter the conference code of conduct drafted by an “inclusivity working group” I chaired, in a way that would discourage or prevent attendance at DH conferences by people working for government agencies or the military-related private sector, or to require them to disclose their employment status publicly. I’ve also recently read a published essay in which the mere attendance of English professors at a DARPA-sponsored academic workshop on theories and neuroscience of narrative is cited as damning evidence of the “dark side” of literary studies’ digital turn.

Such impulses prompt me to ask: how isolated from internal conversations driving the development of the American security state should we humanists remain? How shut to them? How irrelevant must we keep our viewpoints and hard-won humanities knowledge, in order to stand clean before our peers?

As Greg and I talked through the particulars of the Amerithrax case, I realized that key hermeneutic methods and base assumptions of the humanities–fundamental to the interpretive, ludic, and experimental way the DH community has long approached techniques like sentiment analysis, stylometry and authorship attribution, visualization, and data analysis at scale–could be made to speak to a dangerous nexus of behavioral analysis and criminal investigation in our “big data” age. Even scholars’ and archivists’ basic, completely naturalized understandings of humanistic data as subjective, selected, partial, ambiguous, and always particularly situated in space and time were important to communicate to the audience for a textbook like this. I was also attracted to the opportunity to encourage in law enforcement classrooms and among policy-makers a conversation (as we put it in the case study) not “about what is possible, but about what is advisable”–to prompt trainees and national security advisors to discuss professional ethics and the consequences of over-reliance on distant reading, when lives and liberties are at stake.

Our case study–while trying to tell the story of the Ivins EPAB analysis from declassified records–touches on issues as diverse as the assumption of privacy by civilian military employees and the impact of email monitoring on scientific communication, the USA PATRIOT Act’s “chilling effect on the day-to-day information-seeking behavior of average citizens,” and long-held humanities understandings of what we call “the relationship between algorithm and interpretation.” We conclude that:

It is important for law enforcement investigators to understand that big data analysis in crime-solving and behavioral analysis is rife with decision-making and contingency. Its conclusions can be dependent upon the subjective standing points of those who assemble data sets, design the processes by which they are analyzed, and interpret the results. In other words, such techniques provide no push-button answers—only arrangements of information that must be interpreted in almost a literary sense and which, in fact, themselves depend on a chain of previous decision-points, interdependencies, moments of expert intuition, and close, interpretive readings (Chessick, 1990).

and see data mining as:

an aid to interpretation of selected and processed (therefore, in some sense, pre-interpreted) datasets. It can be a crucial means of focusing investigators’ attention—but is never a substitute for close and critical reading of sources, or for psychological and behavioral analysis.

We attempt to convey Johanna Drucker’s warning that:

technologies of display borrowed from the natural and social sciences can render those who study humanistic datasets “ready and eager to suspend critical judgment in a rush to visualization.” Drucker holds that all human-generated data must instead be understood as capta—not as something rationally observed, neutrally presented, and given, but rather as that which is taken, in the form of subjective representations demonstrating and demanding the “situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed” by people with inherent, inescapable agendas or biases, blind-spots, and points of view (Drucker, 2011).

Finally, in considering algorithmic data analysis and visualization techniques as “an aid to elucidation,” rather than a substitute for close reading, we find that:

we can usefully bring to bear lessons learned from the application of computing to interpretive problems in humanities scholarship. These range from the impact of implicit assumptions and biases on research questions and the assembly of datasets (Sculley & Pasanek, 2008) to the reminder that subjective and objective concerns must be kept in useful tension in text analysis, data mining, and visualization (Clement, 2013). A comprehensive review by the Council on Library and Information Resources, of eight large-scale digital humanities projects funded under an international “Digging into Data Challenge” scheme in 2009 and 2011, found that “humanistic inquiry,” like human behavior, is “freeform, fluid, and exploratory; not easily translatable into a computationally reproducible set of actions.” This review identified a characteristic need that data-driven projects in the humanities share with the application of data analytics to investigations of insider threat: the need to address inevitable gaps “between automated computational analysis and interpretive reasoning” that can “make allowances for doubt, uncertainty, and/or multiple possibilities” (Williford & Henry, 2012).

In the end, I’m not sure how successful the chapter will be, in bringing humanities values and concerns to a law enforcement audience (just as this blog post may well miss its different mark). Preliminary reviews by investigators involved in the case and familiar with the state of discourse around “big data” analysis in law enforcement have been positive, but the case study represents a rather new format and readership for me. I was open to the attempt.

You can freely download a peer-reviewed pre-print version of “Interpretation and Insider Threat: Re-reading the Anthrax Mailings of 2001 Through a ‘Big Data’ Lens,” from UVa’s open-access institutional repository.