Big Data for Historians

· by admin

Franco Moretti’s article “Graphs, Maps, and Trees: Abstract Models for Literary History,” shows how most histories of the book have focused on a few particular books rather than the novel as a genre.  This has probably had to do with scale.  It would be an overwhelming and almost impossible task to read every single novel in order to write a comprehensive and all-encompassing history of the novel.  However, this is where Big Data allows historians to look at large trends in ways that are impractical without computers.  Moretti shows how the history of the novel can be looked at numerically and graphically using big data techniques.

Admittedly, it is a different kind of history and it may only be a starting point for historians. But I think, what it brings to the table for historians is the ability to raise new questions and trends, or “cycles” for Moretti, that might have previously been overlooked.

I guess I see two big challenges for historians who want to use big data.  First, unless you want to use some limited census data or mine a database such as google books, you need quite a bit of technological skill and many times might have to create your own data.  As I was looking for a big data site to add to the Zotero group this week, I realized that there aren’t a lot of historical big data sets out there right now.  Many big data sets are focused on collecting data from social media and blogs today for tomorrow’s historian.  (Collecting big data is an important task for historians to think about and be involved with but that, I think, involves another set of issues and concerns.)  Secondly, the question of deep or distant reading is, I think, an important one.  Distant Reading, a term coined by Moretti, involves looking at a trend such as the history of the novel by using massive tables and charts of data rather than reading every single novel ever written.  I think for historians there are questions about the effectiveness and the usefulness of such a strategy for studying historical trends.  The danger, I think, would be oversimplifying a trend and overlooking the contingency and complexity that most histories include.