Results
You can find the results for the project here.
Playlist
Want to influence the playlist? Add something here. But add only one or two tracks each, please.
Rationale
In Nabokov’s Favorite Word is Mauve, Ben Blatt suggests that Hemingway would find his “mathematical analysis equal parts illuminating and outlandish.” But he also suggests that “it’s just as outlandish to focus on a small sample and never look at the whole picture.” We’ve read some Vonnegut—a whole 100 pages—so what right do we have to say that we know anything about his themes or style? How could we go about correcting this? After all, we only have one week of the semester remaining! More importantly, what could we learn if we actually managed to pull this off?
Strap in for one last crazy digital humanities experiment!
Part of this project will involve us once again confronting the friction of formats. Computational approaches require computable information, and it turns out that print books aren’t. Or at least, they aren’t yet. This project will present you with a second chance to think about what it takes to get material from one format into another.
The Nitty Gritty
Collectively, we will begin building a dataset of all of Kurt Vonnegut’s stories and novels. Individually, you will be scanning a number of pages from one or two of his books and then processing those scans with optical character recognition (OCR) software. You will finish this by Monday, December 13 at 11:59pm. Then you’ll get a chance to play with the data during the final exam.
Scanning
It turns out that books can be kind of hard to scan since there’s all that pesky page-turning to deal with. We’re going to simplify things by making these books less book-like and removing their bindings.
- You will scan the pages that you are assigned, turning the pages into PDFs. If you have selections from multiple books, please scan those pages in separate chunks, creating different files.
- There are three printers around the JKB that have automatic feeders and that can scan both sides of the page. You can find them in 3003C, 4073 East, and 4016 West. There’s also a printer in the Office of Digital Humanities office in 1163J JFSB. I estimate that it will take you 5 minutes to do the scanning. The scanners will email you the file.
- Before scanning, you should fan the edge of your pages to make sure none are stuck to each other. Then, you will need to make four changes to the default settings on the printers:
- The default is to only scan one side of the page. You need to make sure you scan 2 sides.
- Set the scanner to work in grayscale rather than color.
- The default on the scanners is 300 dpi, but you should set it to 200 dpi, as it will result in a smaller file.
- The default file type on the scanners is a Compact PDF. This will produce unusable output. Instead, you need to change the file type to PDF. (You do not need to do this if you’re using the copier in 1163J JFSB.)
- If for some reason you aren’t receiving the file, there’s a strong chance that it was too large to email. In that case, break your bundle of pages into two or three chunks and repeat the process for each.
- When you have received the file, check it to make sure you scanned both sides of the pages.
- Then rename your PDF to bookabbreviation_pages_lastname; for example, CC_265-287_croxall.pdf. If you’re working with different books, make sure you save these as different files.
- Upload a copy of the PDF(s) to this Google Drive folder.
OCR
- To do the OCR work, you will need to bring your PDF(s) to one of several computers in the JFSB that have Prizmo installed on them.
- All of the iMacs in the labs on the main floor of the JFSB (1131) have Prizmo installed. This lab is open Monday – Saturday, from the morning until late in the evening.
- All of the iMacs in our classroom should have Prizmo installed. You will be able to use it during our class periods next week—during which we will not be meeting. Additionally, I’ve reserved the room at the following times:
- Friday, 3 December from 9 am – 1 pm and from 2-7 pm
- Friday, 10 December from 9 am – 5 pm
- When opening Prizmo, choose “New Document…” and then drag-and-drop your PDF onto the window.
- Select all of the pages in your file by clicking on a single page image in the leftmost bar and choosing Edit > Select All (⌘A).
- Then click “Recognize” in Prizmo’s upper-right corner.
- Sit back and relax as Prizmo processes all of your text.
- As you look through each page of the text, do the following three things:
- First, make sure that only the body of the text is selected on the page, removing the page numbers and/or page headers. You may have to rearrange the regions that Prizmo has automatically selected.
- Second, make sure you don’t have dots in the text. If you do, it means you’re working on a copy of Prizmo that hasn’t been registered. Please email me.
- Third, check for misspellings and correct them. They will have red underlines; if the word is spelled correctly and will repeat regularly throughout your pages (like a character’s name, a place name, or a foreign word), you can right-click and tell Prizmo to ignore that word moving forward.
- Fourth, check for random characters such as numbers or strange punctuation marks and correct them. Prizmo won’t catch these with red underlines most of the time. Just glance around the page. You’ll be surprised how good the human eye is at seeing misplaced characters.
Final Exam
During the final (Thursday, 16 December from 3-6pm), you will work in groups to analyze our newly created Vonnegut corpus using the various text analysis tools provided by Voyant, which was developed by Stéfan Sinclair and Geoffrey Rockwell. You will use Voyant’s tools to help you practice digital humanities. Put differently, you’ll identify patterns and then interpret them.
Working in groups, you will not read Vonnegut using at least two of the different tools in Voyant. Each group will have one tool assigned to them; you will be free to pick the other. Tools that I am known to favor include TextualArc, Mandala, Microsearch, TermsRadio, Topics, Phrases, and Bubblelines.
By the end of the final, your group will collectively write a 200-word per-person (minimum) Google Doc. (So if you have three people in your group, you’ll write 600 words minimum, and so forth.) Make sure that you share your document with b [dot] croxall [at] gmail [dot] com. Your document will discuss patterns you’ve found and how you interpret them to build on our understanding of Vonnegut. Your Doc should include images that you’ve created in Voyant. You will have approximately 2h15m of the final to do your exploration and complete the writing. The final 45 minutes of the exam will be devoted to each group sharing what they’ve found, as well as a few final words from me.
Of course, it’s important to recognize that we might not learn anything earthshattering—or even anything—by taking this approach. That’s okay. We are, to a certain extent, just screwing around. As Rockwell puts it in his essay “What is Text Analysis, Really?”: “Playful experimentation is a pragmatic approach of trying something, seeing if you obtain interesting results” (214, emphasis added). We’re out to have fun and see if we find anything interesting along the way.
Then we’ll all high-five each other and ride off into the sunset.
Final Exam Groups
Group 1 (Correlations)
- Maria
- Winthrop
- Ashlin
Group 2 (TextualArc)
- Elizabeth Bodily
- Brooke
- Amanda
Group 3 (Mandala)
- Chloe
- Estelle
- Dane
Group 4 (WordTree)
- Jenni
- Eliza
- Allie
Group 5 (Microsearch)
- Elizabeth Bennett
- Chase
- Ashley
Book Assignments
Name | Book(s) | Pages |
Chloe | PP | 1-200 |
Maria | PP / SoT | 201-341 / 1-60 |
Elizabeth Bennett | SoT | 61-260 |
Elizabeth Bodily | SoT / MN | 261-326 / v-xiii, 1-124 |
Estelle | MN / CC | 125-268 / 1-60 |
Chase | CC | 61-264 |
Brooke | GBYMR | 1-200 |
Winthrop | GBYMR / SHF | 201-275 / 1-126 |
Ashlin | SHF / BoC | 127-275 / 1-52 |
Dane | BoC | 53-254 |
Ashley | BoC / SS | 255-302 / 1-152 |
Jenni | SS / JB | 153-274 / 1-78 |
Eliza | JB | 79-280 |
Allie | DD | ix-xiv, 1-196 |
Amanda | DD / G | 197-271 / 1-126 |
Croxall | CC / JB / G | 265-287 / 281-306 / 127-324 |
Book Abbreviations
BoC = Breakfast of Champions
CC = Cat’s Cradle
DD = Deadeye Dick
G = Galápagos
GBYMR = God Bless You, Mr. Rosewater
JB = Jailbird
MN = Mother Night
PP = Player Piano
SHF = Slaughterhouse-Five
SoT = Sirens of Titan
SS = Slapstick
Grading
This project, which will include the final, is worth 15% of your grade in the class. Half of those 150 points are related to your completing the scanning and OCR work on your assigned pages (again, you should not go over 5 hours). The other 75 points will be awarded based on your and your group’s work during the final exam. Remember, as an experimental class project, you are not being graded on what you and your group find about Butler’s work. After all, we simply don’t know what we’ll find—if anything.
Instead, you’ll be graded on (1) whether you accomplish all the parts of the assignment (pass/fail); (2) how engaged you are with the work; and (3) how well you apply the method of screwing around / pattern recognition/interpretation we’ve been embracing throughout the semester.
Credits
This assignment was designed by Brian Croxall, originally in 2014 and with Hemingway in mind, and is licensed with a Creative Commons BY (CC BY 4.0) license. Special props to Stewart Varner for telling me to stop thinking about Whitman; David Mimno and Ted Underwood for encouragement; and Paul Fyfe and Jason B. Jones for an idea that I gleefully ripped off. None of this would be possible without the fantastic resource of Voyant Tools, which Stéfan Sinclair and Geoffrey Rockwell have developed for years amid constant pestering from me for new features.