Results

You can find the results for the project here.

Playlist

Want to influence the playlist? Add something here. But add only one or two tracks each, please.

Rationale

In Nabokov’s Favorite Word is Mauve, Ben Blatt suggests that Hemingway would find his “mathematical analysis equal parts illuminating and outlandish.” But he also suggests that “it’s just as outlandish to focus on a small sample and never look at the whole picture.” We’ve read some Vonnegut—a whole 100 pages—so what right do we have to say that we know anything about his themes or style? How could we go about correcting this? After all, we only have one week of the semester remaining! More importantly, what could we learn if we actually managed to pull this off?

Strap in for one last crazy digital humanities experiment!

Part of this project will involve us once again confronting the friction of formats. Computational approaches require computable information, and it turns out that print books aren’t. Or at least, they aren’t yet. This project will present you with a second chance to think about what it takes to get material from one format into another.

The Nitty Gritty

Collectively, we will begin building a dataset of all of Kurt Vonnegut’s stories and novels. Individually, you will be scanning a number of pages from one or two of his books and then processing those scans with optical character recognition (OCR) software. You will finish this by Monday, December 13 at 11:59pm. Then you’ll get a chance to play with the data during the final exam.

Scanning

It turns out that books can be kind of hard to scan since there’s all that pesky page-turning to deal with. We’re going to simplify things by making these books less book-like and removing their bindings.

  • You will scan the pages that you are assigned, turning the pages into PDFs. If you have selections from multiple books, please scan those pages in separate chunks, creating different files.
  • There are three printers around the JKB that have automatic feeders and that can scan both sides of the page. You can find them in 3003C, 4073 East, and 4016 West. There’s also a printer in the Office of Digital Humanities office in 1163J JFSB. I estimate that it will take you 5 minutes to do the scanning. The scanners will email you the file.
  • Before scanning, you should fan the edge of your pages to make sure none are stuck to each other. Then, you will need to make four changes to the default settings on the printers:
    • The default is to only scan one side of the page. You need to make sure you scan 2 sides.
    • Set the scanner to work in grayscale rather than color.
    • The default on the scanners is 300 dpi, but you should set it to 200 dpi, as it will result in a smaller file.
    • The default file type on the scanners is a Compact PDF. This will produce unusable output. Instead, you need to change the file type to PDF. (You do not need to do this if you’re using the copier in 1163J JFSB.)
  • If for some reason you aren’t receiving the file, there’s a strong chance that it was too large to email. In that case, break your bundle of pages into two or three chunks and repeat the process for each.
  • When you have received the file, check it to make sure you scanned both sides of the pages. 
  • Then rename your PDF to bookabbreviation_pages_lastname; for example, CC_265-287_croxall.pdf. If you’re working with different books, make sure you save these as different files. 
  • Upload a copy of the PDF(s) to this Google Drive folder.

OCR

  • To do the OCR work, you will need to bring your PDF(s) to one of several computers in the JFSB that have Prizmo installed on them.
    • All of the iMacs in the labs on the main floor of the JFSB (1131) have Prizmo installed. This lab is open Monday – Saturday, from the morning until late in the evening.
    • All of the iMacs in our classroom should have Prizmo installed. You will be able to use it during our class periods next week—during which we will not be meeting. Additionally, I’ve reserved the room at the following times:
      • Friday, 3 December from 9 am – 1 pm and from 2-7 pm
      • Friday, 10 December from 9 am – 5 pm
  • When opening Prizmo, choose “New Document…” and then drag-and-drop your PDF onto the window.
  • Select all of the pages in your file by clicking on a single page image in the leftmost bar and choosing Edit > Select All (⌘A).
  • Then click “Recognize” in Prizmo’s upper-right corner.
  • Sit back and relax as Prizmo processes all of your text.
  • As you look through each page of the text, do the following three things:
    • First, make sure that only the body of the text is selected on the page, removing the page numbers and/or page headers. You may have to rearrange the regions that Prizmo has automatically selected.
    • Second, make sure you don’t have dots in the text. If you do, it means you’re working on a copy of Prizmo that hasn’t been registered. Please email me.
    • Third, check for misspellings and correct them. They will have red underlines; if the word is spelled correctly and will repeat regularly throughout your pages (like a character’s name, a place name, or a foreign word), you can right-click and tell Prizmo to ignore that word moving forward.
    • Fourth, check for random characters such as numbers or strange punctuation marks and correct them. Prizmo won’t catch these with red underlines most of the time. Just glance around the page. You’ll be surprised how good the human eye is at seeing misplaced characters.
  • Remember: I’m not asking you to read through each and every single word.
  • If you can’t finish all of this OCR work at once, you can save the work as a Prizmo file (.pzdoc). Make sure you save it to a flash drive or upload it to something like BoxDropbox, or Google Drive.
  • When you’ve finished all of your pages, choose File > Export… and set the Format to “Regular Text”. Make sure you include all pages. Click “Export to File…” and then save the file.
  • Name your file bookabbreviation_pages_lastname; for example, TBQ_201-338_croxall.txt and email me the file. In the email, please let me know if you finished all of your pages or not.
  • If you have pages from more than one book, you will have to go through this process with your two different files. But don’t worry, I made the number of pages equal for everyone. Well, except for me. I got extra.
  • Important: Please do not spend any more than five hours on the OCR, even if you don’t finish. Just let me know in your email.
  • You need to get me all of your files no later than 11:59pm on Monday, 13 December. This will give me the chance to compile everything in time for the final.
  • Final Exam

    During the final (Thursday, 16 December from 3-6pm), you will work in groups to analyze our newly created Vonnegut corpus using the various text analysis tools provided by Voyant, which was developed by Stéfan Sinclair and Geoffrey Rockwell. You will use Voyant’s tools to help you practice digital humanities. Put differently, you’ll identify patterns and then interpret them.

    Working in groups, you will not read Vonnegut using at least two of the different tools in Voyant. Each group will have one tool assigned to them; you will be free to pick the other. Tools that I am known to favor include TextualArcMandalaMicrosearchTermsRadioTopicsPhrases, and Bubblelines.

    By the end of the final, your group will collectively write a 200-word per-person (minimum) Google Doc. (So if you have three people in your group, you’ll write 600 words minimum, and so forth.) Make sure that you share your document with b [dot] croxall [at] gmail [dot] com. Your document will discuss patterns you’ve found and how you interpret them to build on our understanding of Vonnegut. Your Doc should include images that you’ve created in Voyant. You will have approximately 2h15m of the final to do your exploration and complete the writing. The final 45 minutes of the exam will be devoted to each group sharing what they’ve found, as well as a few final words from me.

    Of course, it’s important to recognize that we might not learn anything earthshattering—or even anything—by taking this approach. That’s okay. We are, to a certain extent, just screwing around. As Rockwell puts it in his essay “What is Text Analysis, Really?”: “Playful experimentation is a pragmatic approach of trying something, seeing if you obtain interesting results” (214, emphasis added). We’re out to have fun and see if we find anything interesting along the way.

    Then we’ll all high-five each other and ride off into the sunset.

    Final Exam Groups

    Group 1 (Correlations)
    • Maria
    • Winthrop
    • Ashlin
    Group 2 (TextualArc)
    • Elizabeth Bodily
    • Brooke
    • Amanda
    Group 3 (Mandala)
    • Chloe
    • Estelle
    • Dane
    Group 4 (WordTree)
    • Jenni
    • Eliza
    • Allie
    Group 5 (Microsearch)
    • Elizabeth Bennett
    • Chase
    • Ashley

    Book Assignments

    Name Book(s) Pages
    Chloe PP 1-200
    Maria PP / SoT 201-341 / 1-60
    Elizabeth Bennett SoT 61-260
    Elizabeth Bodily SoT / MN 261-326 / v-xiii, 1-124
    Estelle MN / CC 125-268 / 1-60
    Chase CC 61-264
    Brooke GBYMR 1-200
    Winthrop GBYMR / SHF 201-275 / 1-126
    Ashlin SHF / BoC 127-275 / 1-52
    Dane BoC 53-254
    Ashley BoC / SS 255-302 / 1-152
    Jenni SS / JB 153-274 / 1-78
    Eliza JB 79-280
    Allie DD ix-xiv, 1-196
    Amanda DD / G 197-271 / 1-126
    Croxall CC / JB / G 265-287 / 281-306 / 127-324

    Book Abbreviations

    BoC = Breakfast of Champions
    CC = Cat’s Cradle
    DD = Deadeye Dick
    G = Galápagos
    GBYMR = God Bless You, Mr. Rosewater
    JB = Jailbird
    MN = Mother Night
    PP = Player Piano
    SHF = Slaughterhouse-Five
    SoT = Sirens of Titan
    SS = Slapstick

    Grading

    This project, which will include the final, is worth 15% of your grade in the class. Half of those 150 points are related to your completing the scanning and OCR work on your assigned pages (again, you should not go over 5 hours). The other 75 points will be awarded based on your and your group’s work during the final exam. Remember, as an experimental class project, you are not being graded on what you and your group find about Butler’s work. After all, we simply don’t know what we’ll find—if anything.

    Instead, you’ll be graded on (1) whether you accomplish all the parts of the assignment (pass/fail); (2) how engaged you are with the work; and (3) how well you apply the method of screwing around / pattern recognition/interpretation we’ve been embracing throughout the semester.

    Credits

    This assignment was designed by Brian Croxall, originally in 2014 and with Hemingway in mind, and is licensed with a Creative Commons BY (CC BY 4.0) license. Special props to Stewart Varner for telling me to stop thinking about Whitman; David Mimno and Ted Underwood for encouragement; and Paul Fyfe and Jason B. Jones for an idea that I gleefully ripped off. None of this would be possible without the fantastic resource of Voyant Tools, which Stéfan Sinclair and Geoffrey Rockwell have developed for years amid constant pestering from me for new features.