Blog

Talking myself through word embeddings, part 1

Last week, I had the very real pleasure of attending a Word Vectors for the Thoughtful Humanist workshop. This workshop was hosted by Northeastern’s Women Writers Project (WWP) and was sponsored by a grant from the NEH’s Office of Digital Humanities. The principal instructors throughout the week were Julia Flanders and Sarah Connell, respectively the WWP Director and Assistant Director.

I have a lot that I could say about the workshop, and I hope to collect some thoughts in some short blog posts over the coming days. The idea is that if I try to write some short thoughts rather than say everything, I might end up saying something. But I’ve got to start somewhere, and where I want to start is trying to explain word vectors to myself. I expect that this exercise will prove useful to retain some of what I learned last week, as well as prepare me to share this methodology with my students, when I teach it in January 2022, and my colleagues in BYU’s Office of Digital Humanities, when I teach them later this summer.

Put very simply, word vectors are a means to represent linguistic data in multi-dimensional space and calculate their similarity. The algorithm—normally word2vec or GloVe—looks at a word and its neighbors within a window that the researcher sets (e.g. 5 words to either side of the key word). Each token (individual word) is converted into a type (which is to say that “marriage” only appears once in the model) and then placed in vector space. These placements are essentially random as the window moves across a document, but as the model “reads” a word and the other words within the window, it makes small adjustments to where these words lie in the multidimensional space. (How many dimensions? As many as there are types within the document.) Words that are within the window get adjusted within the vector space so that they are “closer” to one another. And with the magic of negative sampling, a certain number of words that are not within the current window get adjusted a bit so they are “farther” from the word the algorithm is currently looking at. Run this process across every term in the document and then do it as many times as possible, and you end up with a vector space in which all words are represented, and the words that are used near each other end up being “close” in this high dimensional space, at least as measured by cosine similarity. (Please, please do not ask me to explain cosine similarity. I last took math in 12th grade, to my DH-loving, everlasting shame.)

But what’s really, potentially magical about this approach is not just that words like “Brian” and “Croxall” end up close to each other in vector space because they tend to appear close to one another. Instead, it’s the fact that this method places words that are used in similar discursive spaces close to one another. For example, if Shakespeare calls someone a “poxy knave” and calls someone else a “poxy blackguard,” not only will “knave” and “blackguard” be near “poxy,” but they will be nearer each other. What’s even more magical is that the types that make up “syphilitic fool” will also end up being near “poxy,” “knave,” and “blackguard” because they are part of, again, similar discursive spaces.

So, there you have it: an initial pass at explaining what word vectors are. In the spirit of a long-ago post by Chris Forster, please tell me where I’m wrong.

I want to acknowledge that my thinking has been informed not just by the workshop but by some of what we were asked to read to prepare. In particular, I learned a lot from the following: two posts by Ryan Heuser (“Word Vectors in the Eighteenth Century” part 1 and part 2), one by Ben Schmidtone by Gabriel Recchia (responding to Heuser), and one by Laura Johnson (one of the team at the workshop).

Revenge of the Blog

The trouble with blogging is that it’s great if you do it regularly and it’s…far less great when you don’t do it as regularly. For years, I had a goal of posting to my blog once per month, and I managed to hit that. But I got busier as my kids got older and my work got more demanding.

I’m proud to say that I’ve made some really strides in the last five or six years toward achieving a better work/life balance. I no longer spend most of my evenings writing and reading. Am I less productive for it? In the realm of blogging, definitely. In the rest of my life, I think not. 

Nevertheless, I’ve missed the regular appointment with this space that required me to say something about what I’ve been working on or thinking about. I’ve admired the work of Kathleen Fitzpatrick—for many reasons—but for the last few years, I’ve appreciated how she’s taken blogging back from 1k-plus posts to the shorter thoughts that predominated in the first half of the 2000s. I’m going to try to do the same, not letting the perfect be the enemy of the finished. 

Here’s hoping!

Do You Teach DH? Please Take Our Survey!

Diane Jakacki and I are excited to announce the launch of a research study: “Who Teaches When We Teach Digital Humanities?” With this study, we hope to learn more about 

  1. the training and preparation of those who teach digital humanities
  2. the for-credit and informal teaching that DH teachers do

“I teach DH!” you say. “How can I participate in this study??” We’re glad (really glad) you asked. You can participate by taking our short survey

“But wait,” you say. “You both work at universities in the United States. I teach in [fill in any place on Earth]. Do you still want me to participate?” Absolutely. We hope to have as much global participation as possible. Wherever you teach, in whatever capacity you teach, please take the survey.

The survey should take approximately 10 minutes for you to complete. Your participation in the survey will be anonymous; to this end, we have ensured that our survey instrument will not record your IP address. You will not, we are afraid, be compensated for your participation in the study. 

This study grew out of our work on Debates in Digital Humanities Pedagogy and we look forward to publishing some of what we learn in that venue, as well as in conference presentations. 

The survey will be open until 1 September 2019. Please take it soon, and please share this announcement with any and all you think may be interested.

The “Next Big Thing” Ten Years Later: Digital Humanities at MLA 2019

This past December, as the semester was winding down, I got a message from Anna Chang, Head of Communications for the Modern Language Association. She inquired whether I would consider writing 500 words or so that summarized the different digital humanities sessions at the 2019 MLA Convention in Chicago for the MLA Newsletter.

Cat typing at a keyboard, with the phrase "This is relevant to my interests" superimposed on the image.

This was, as the Internet says, relevant. I happily took on the assignment and shortly after the Convention had to figure out how to condense everything I had seen into as few words as possible. I overshot the 500 words by more than 50%, but the MLA were good sports about it.

The piece appeared last week in the Spring 2019 MLA Newsletter, and I’m sharing it here with the permission of the MLA.


Almost a decade has passed since, during the 2009 MLA convention, William Pannapacker wrote in The Chronicle of Higher Education that “the digital humanities seem like the first ‘next big thing’ in a long time” (“MLA”). He later wrote that he regretted that claim because it had become “a basis for a rhetoric that presents [digital humanities] as some passing fad” (“Pannapacker”). If the 2019 convention is any indication, it should be clear that digital humanities is neither a fad nor passing. Over the course of this year’s convention, I attended a wide range of sessions—many of them standing room only—that focused on the digital either as methodology or as object of inquiry, and I missed as many again since I could not be in two places at once.

Perhaps the most exciting session that I attended featured a roundtable of early-career scholars focused on “critical computation”: the use of quantitative methods to address issues of race, gender, or other aspects of social difference (MLA session 417). The lightning-talk format of this session meant that each panelist spoke for only five minutes, giving me just enough of a sense of their project to have several questions for each of them. Individual presenters discussed, among other topics, the use of sentiment analysis to analyze feeling in poetry from the Black Arts Movement (Ethan Reed); the declining percentage of women authors over the last 150 years, as represented in 104,000 volumes of the HathiTrust collection (Sabrina Lee); cataloging references to the female body through pronouns in early modern poetry (Whitney Sperrazza); and an examination of how women authors are represented in anthologies of literary journalism (Jonathan Fitzgerald). The final speaker, Kenton Rambsy, reported on his work on a data set of 101 anthologies that contain short fiction by black writers. His paper, which, like many from the session, has been deposited in CORE, demonstrates that “anthology editors shaped the landscape of African American literature by repeatedly publishing approximately 30 stories by a core group of [seven] writers,” Seeing the effect of digital humanities training on the research of young scholars was almost as inspiring as the questions of diversity to which they are turning their attention.

Those who worry that digital humanities research ignores the material could have found much to consider in a session called “Bookish Transactions: Publishing. Media, and Materialism.” which pointed to a particular concern with the codex (session 481), Opening the discussion, Lee Konstantinou observed that more attention has been paid to the effect MFA writing programs have had on literary production than on the consolidation of the publishing industry since the 1960s, Matthew Kirschenbaum suggested that we know more about how books were made in Gutenberg’s time than about how they are made today. He then discussed his visit to a production plant for a major commercial printer in Kendallville, Indiana, tracing the entry point of book manuscripts into the building through fiber-optic cables and their departure on pallets along the building’s rail spur. After N. Katherine Hayles spoke about the production of scholarly monographs, the remaining three panelists gave overviews of large-scale, computational work on the publishing industry. Laura McGrath focused on the role of literary agents, Richard Jean So on race and publishing, and Dan Sinykin on conglomeration and neoliberalism. Thetalks highlighted the shared history of digital humanities and book history scholars and suggested, as Kirschenbaum put it, that all scholars should consider the intersection of social justice and supply chain in the question “Who is making your book?”

Multiple panels at the convention focused on the subject of digital humanities pedagogy. Two panels that I organized with Diane Jakacki (sessions 89 and 639) took as their starting point that one’s answer to the question “What is digital humanities?” is most clearly articulated in decisions in the classroom. Panelists pointed to the emphasis in digital humanities pedagogy on process and connected it to the writing classroom (Grant Glass), addressed the utility of digital humanities training at a regional polytechnic university (Mitchell Ogden), discussed the difficulty of digital humanities training within multilingual programs like East Asian studies (Molly Des Jardin), and considered the perils and imperatives of professionalizing graduate students in digital humanities seminars (Lindsay Thomas).

I attended other sessions that discussed everything from digital scholarly editions (session 245) to the far-reaching impact of NEH-sponsored summer seminars on literature and technology (session 350) and missed sessions that covered critical approaches to augmented and virtual reality (session 155), the rights and responsibilities of collaboration (session 487). and the relation between design and fiction (session 635). Such a wealth of offerings highlights that digital humanities is not, in fact, a fad. At the same time, however, the last ten years have made clear that not all humanities scholarship need be digital. Digital humanities is just one method among many that we can use to understand the products and producers of modern languages. In the end, it’s the opportunity to attend—back-to-back—a session that close-reads Hemingway’s relation to his boyhood environs in Oak Park and another that distant-reads his entire corpus that brings me to the MLA convention every year.

Works Cited

Pannapacker, William. “The MLA and the Digital Humanities.” The Chronicle of Higher Education, 28 Dec. 2009. web.archive.org/web/20150908020431/http://chronicle .com/blogPost/The-MLAthe-Digital/19468/.

—. “Pannapacker at MLA: The Come-to-DH Moment.” The Chronicle of Higher Education, 1 Jan. 2012, www.chronicle.com/blogs/brainstorm/pannapacker-at-the-mla-2-the-come-to-dh-moment/42811.

Rambsy, Kenton. “African American Short Fiction and Data Driven Humanities.” MLA Annual Convention, 5 Jan. 2019, Chicago. Humanities Commons, hcommons.org/deposits/item/hc:22239/.

Announcing Debates in DH Pedagogy!

Diane Jakacki and I are thrilled to announce a call for abstracts for a forthcoming edited volume, Debates in Digital Humanities Pedagogy. The book will appear in the Debates in the Digital Humanities series from the University of Minnesota Press, edited by Matthew K. Gold and Lauren F. Klein.

Over the last decade, Digital Humanities (DH) has reinvigorated discussions of pedagogy in the academy. Unconferences on DH pedagogy and blogs about teaching with digital methods in the humanities classroom have led to extensive discussions about approaches to teaching at annual disciplinary conferences. At the same time, conversations and debates about teaching digital humanities—whether to undergraduates, graduate students, or to the faculty themselves—have led to more and more people becoming involved in the field, each of them coming from different subjects bringing their own perspectives and praxes with them to the teaching of DH. We have arrived at a moment when institutions are formally integrating DH into the curriculum and granting degrees; we are creating minors, majors, and even graduate certificates in DH; all of this while many of us are still new to the experience of (teaching) DH. This calls for another round of discussion of DH pedagogy or a discussion of pedagogy in a new key.

These students—and the ways in which we teach them—are a very real expression of what each of us as instructors believes digital humanities to be. As our students and our colleagues continue to ask us “What is digital humanities?” we have the opportunity to answer their questions in terms of how we teach digital humanities.

Read more at the full CFP here, including the deadline to submit abstracts.