Comparing Corpora in Voyant Tools

I’m at the Digital Humanities 2012 Conference and attended on a Voyant Tools workshop with Stéfan Sinclair and Geoff Rockwell. I’ve enjoyed playing with Voyant many times over the last year, particularly in my Introduction to Digital Humanities class, where we distant-read the work of Carol Ann Duffy.

Voyant’s a great set of tools for doing quick and easy text analysis, and there are lots of little bits and pieces. I just learned how to quickly compare the differences between different corpora, and I wanted to share it quickly (AKA so I don’t forget).

Step One: Upload one corpus

media_1342515358985.png

This part is pretty easy. Go to http://voyant-tools.org/ and upload the first corpus.

Step Two: Find the corpus ID

Now you need to find the ID for your corpus. You do this by starting the export process. Click on the disk icon in the upper right corner.

media_1342589224532.png

This will open the “Save” interface, where you can get a direct URL for your corpus and the skin you’re currently using to explore it. But what you need to get is the number at the tail end of the URL.

media_1342589414110.png

Copy this number (the corpus ID) to your clipboard.

Step Three: Upload Another Corpus

Now you need to upload a second corpus, that you want to test against the first. So go back to http://voyant-tools.org/.

Step Four: Adding “Difference”

Once you’re in the interface of your second corpus expand the “Words in the Entire Corpus” pane in the lower left corner by clicking on the title bar.

media_1342589790456.png

Here you can see a list of the most common words in your second corpus.

media_1342590017753.png

When you hover over any of the column headers, you’ll see a drop-down arrow that gives you more options.

media_1342590158960.png

Click on the arrow, choose the columns option, and enable “Difference.”

media_1342590264642.png

This provides a new column in the “Words in the Entire Corpus” pane.

media_1342590465055.png

But since it’s currently empty, we need to do one last step.

Step Five: Enabling “Difference”

Click on the gear icon in the upper-right corner of the pane.

media_1342590618713.png

The next pop-up will give you the option to add a stop words list to your corpus. You can also add a comparison corpus.

media_1342590713113.png

Here is where you’ll paste the corpus ID that you saved to your clipboard earlier.

media_1342590943095.png

Click “OK” and you can now see how the most common words in your second corpus compare to the ones in your first.

media_1342591023877.png

That’s it. Not immediately intuitive but also not difficult at all, once you know how to do it. In some ways, that’s what you could say that about Voyant itself, I suppose. But it’s totally worth learning.

6 thoughts on “Comparing Corpora in Voyant Tools

  1. Hi Brian, nice post, thanks for sharing this nice method, which I had not discovered yet. I’m definitely going to try this out. Cheers, C.

Comments are closed.