Mapping Gymnasiums in Boston, 1914-1924 (A Visualization for Clio 3)

Screenshot of Mapping Gymnasiums in Boston, 1914-1929
Mapping Gymnasiums in Boston, 1914-1929

In 1889 Josiah Quincey was elected Mayor of Boston.  Throughout his term as mayor he implemented a new system of municipal baths and gymnasiums in the city designed to encourage moral behavior, hygiene, foster community, and encourage exercise for the cities residents.  These gymnasiums, although open to all residents, were placed largely in immigrant heavy neighborhoods and reflected a focus on physical culture that was reminiscent of life in Eastern Europe.  This visualization shows the location of each municipal and private gymnasium in Boston between 1914 and 1925 and provides some information about the ward that these gymnasiums existed in.  Clicking on the public gymnasiums (teal circles) or the private gymnasiums (dark blue circles) pulls up relevant information about the ward the gymnasium was located in.  The size of the public gymnasiums corresponds to the total attendance in each year and clicking on a public gymnasium populates a pie chart with the breakdown of class attendance at that gym.

An image of the shapefile overlayed on top of the historical map in QGIS.
An image of the shapefile overlayed on top of the historical map in QGIS.

The data for this visualization came from three sources.  First, the map itself was created using the Javascript visualization library D3.  The shape file used to generate the map was created from a historic map digitized and available through the Boston Public Library.  Using QGIS the map was geo-referenced and then traced to create a shape containing all the wards in Boston. For more information on the process of creating this shape see my blog post. Second, the locations of the public gymnasiums as well as the attendance data came from the Report of the Boston Parks Department. The private gymnasium locations were pull from Boston city directories.  Lastly, the demographic data about each ward draws from sampled 1910 census data that is available through the Integrated Public Use Microdata Set (IPUMS). The correlation between the location of gymnasiums and demographics about the ward is, admittedly, a jump.  The demographics of the ward where a gymnasium is located is not necessarily indicative of who may have attended this gym.  This visualization is, however, a starting point for a future research project.


This visualization does two very basic things that were useful for this project and for future research.  First, I was able to get a sense of where the municipal gymnasiums were located compared to private gymnasiums and conclude that they were, for the most part, located in the wards with high immigrant populations.  Privately owned and membership based gymnasiums were located in middle class neighborhoods rather than in the working class districts of Central Boston. This was useful as I continued my research because it clearly contradicted much of the rhetoric coming out of the Parks Department and the discourse about gymnasiums by those who supported these institutions.  Often it was claimed that the gymnasiums were for the betterment of the community  and were democratic institutions.  While Bostonians may have believed this, the location of the gymnasiums suggests that there was a clear class distinction in physical culture in Boston. Furthermore, this map along with descriptions in newspapers suggests that the aims of the public and private institutions were quite different.  The public gymnasiums were thought to help clean up the streets and promote proper hygiene.  In contrast, the private gymnasiums, often referred to as “athletic” or “exercise clubs” were a social gathering place for men.  Within this space politics and business in addition to exercise were conducted.

Several historians have discussed the public baths and gymnasium movement in Boston.  They often focus on the municipal institutions that were established by reformers in immigrant districts.  Marilyn Thorton Williams, author of Washing “the Great Unwashed,” argues that there was a shift around the turn of the century from municipal institutions focused on hygiene to institutions that focused on recreation by combining the baths and gyms.   However, as this visualization suggests, there was also a range of private institutions that focused on recreation and defined the role of gymnasiums differently putting an emphasis on creating an environment for socializing as well as athletics. The membership rates of these institution, although varied, also suggests that they catered to a different group than the municipal gymnasiums.

In addition to assembling this visualization I read a variety of sources about the role of the gymnasium in the modern city as well as the construction, layout, and location. After all of this, I am left with more questions about Boston’s gymnasiums and a potential future direction.  Who actually attended these gymnasiums and what was their reasoning for doing so?  Aside from the publicity about Boston’s gymnasiums and reports about the ideal function of gymnasiums within the modern city, what were the spaces used for? Were these, as newspaper articles suggest,  spaces for socialization and community? Or were they focused on the physical development of the body at a time when the industrial city, particularly mechanization, was causing concern about efficiency and the active body? How often were demonstrations held and who attended?  Announcement and floor plans suggest that the body became a spectacle within these gymnasiums and members of the public would gather to observe women “taking exercise.”

Lastly, was there a correlation between the location of gymnasiums and public space in Boston? I think this is a question that would be easily answered by making adjustments to the shapefile in order to display the locations of parks and playgrounds, and perhaps even streetcar lines.  This visualization, along with some primary source research, helped me to gain a better understanding of where the gymnasiums were located and what the layout of the city was.  The insight provided by the visualization has sparked new questions and future directions for research.

Second, the visualization is a template my future research.  I hope to be able to make a trip up to the Harvard University Libraries eventually where they have Dudley Allen Sargent’s papers. Sargent was the director of physical training at Harvard and was very involved in advocating for Physical Culture in Boston.  His papers contain index cards from gymnasiums in Boston and all over the east coast where he visited and observed the members.  From what I can tell, the cards include detailed information about those who attended the gymnasiums and I think this data would make for a far more interesting and useful visualization.  Now that I have a shapefile (the map base) of Boston’s wards, the location of the gymnasiums, and the total attendance in each year plugging in data about individuals should be relatively easy and provide more insight into who actually attended those gymnasiums.

The visualization is available here. All of my code is on GitHub.

Dat Tutorial

What is Dat?

Dat is a set of tools used to “build and share data pipelines.”  Dat was created with the goal of bringing “to data a style of collaboration similar to what git brings to source code.”  The project, developed by Max Ogden, is designed to allow not only easy data sharing but also row by row version tracking. The five key features of Dat’s tools (as described by Dat’s GitHub page) are:

  • making data syncable (much like Git or GitHub)
  • data sets can be very large (billions of rows or terabytes in size) and/or updated frequently (real time data)
  • data can be either tabular (rows & cells) or blobs (large files and/or unstructured)
  • plugin APIs to connect dat to any existing database/format/language/storage backends
  • built with automated workflows in mind

This tutorial will walk through the basics of using dat for creating and managing a data set. We’ll begin with a simple dataset, then we’ll import the data into a new dat store, view the data, make changes, add new data, and view version history.  We’ll be working with dat on our local machine but keep in mind that Dat can also be deployed on a server to facilitate data sharing. Starting with just the basics of Dat showcases the tools potential and hopefully will be helpful for anybody creating large datasets by hand for class.

Dat for Digital Historians

dat-hex-2I think Dat has the potential to be useful and for historians in several ways.  First, I think it’s useful to verify and keep track of the history of historical data sets. Because Dat allows you to track the version history, its easy to go through and verify the data.  Secondly, Dat is designed to allow data syncing.  So in theory, if you were collaborating with someone on creating a data set it would be possible to only sync the data that has changed since the last time you synced.  I think this has the potential to make collaborating and creating large datasets an easier and verifiable practice.  Max Ogden originally designed dat to be useful for Governments or large organizations who often make data available on their website in one massive csv file.  When it changes, you then have to go back in and redownload the csv, figure out whats new, and redo any formatting, reshaping, or calculations.  The goal of dat is to make that process easier by allowing the syncing of only the data that has changed.  Historical data doesn’t change very often but if you’re collecting data from various locations or sources I think Dat could be very useful.  If you’re interested you can read more about the history of dat and Ogden’s vision here.

Get the Data

I’ve uploaded a GitHub repository that has some data for this tutorial. You can clone this repository so you can follow along with the data I’ve provided or you can use your own data. To clone the data run this in the terminal:

git clone


Dat is a node module and needs to be installed using the Node Package Manager(NPM) in the command line. You’ll need to have Node installed. More details on installing the proper version of Node.js can be found on the Dat GitHub Repository. You can use this command to install node via the command line:

npm install dat -g

If you get an error, you may need to use sudo before the command.

Ways to Use Dat

There are three main interfaces or APIs for Dat:

We’ll cover the first two of these interfaces in this tutorial. The Command Line interface allows users to stream data to the dat store from CSV, TSV, or JSON files. The REST API provides access to the Dat graphical user interface (gui). In the GUI you can update data, attach files, and view various versions. Using this API you can also query the data store and generate csv/json copies of your data. The Javascript API is also useful but requires you to write your own code to interact with the database. I’ll include links to the examples in the Dat documentation and leave this api for you to experiment with.

Creating a Dat, Adding & Viewing Data

Initializing a Dat Data Store

Dat’s command line interface is similar to Git. To begin, we need to initialize a new data store by using the dat init command. First lets navigate into the directory we cloned from GitHub and then use the dat init command.

cd DatTutorial
dat init

Dat will prompt you for the Name, a description, and the publisher of the data. Once you’ve provided a value for each of those, Dat will create a new hidden directory called ‘.dat’. This directory will house all of the data you stream into it. You can confirm that dat inti worked by running ls -a in your terminal to view all the hidden files.

Adding Data

Dat will accept data in two forms: csv and line-seperated JSON. We’ll use csv data since that’s the most common way historians store data. To get a csv into dat we need to stream it in using the command line. You can stream your csv in using:

dat import --csv BostonGyms.csv


cat BostonGyms.csv | dat import --csv

Both commands do the same thing just in a slightly different way. The second command uses a pipe to direct the output of cat BostonGyms.csv into dat. When data is added Dat assigns each row of the csv a key and a version number. The key is a unique identifier that we can use to reference the row later. The version number is automatically updated every time a change is made to that row.

Viewing Data in the Dat GUI

Now that our data is stored in dat it would be useful to be able to view it. To view the data we can launch the Dat GUI by running dat listen in the terminal. This command returns a message that says “Listening on port xxxx. (In my case port 6461 but change it to whatever is printed in your terminal.) What this means is that Dat has started a new local webserver on our computer and if we navigate to localhost:6461 in a web browser we’ll see the Dat GUI. This interface, built on the LevelUp Database, allows us to interact with the data like we would a database.

Screen Shot 2014-09-28 at 4.58.01 PM
Dat’s Graphical User Interface (GUI)

In this interface we can update any cell by clicking on the green edit button that appears when hovering over it. If we make a change to any cell and click update, Dat will store a new version of that row and change the version number to 2. Notice that the row’s version number changes but the key doesn’t.

Editing an individual cell


Clicking on the icon in the far left cell brings up a Row summary with each column name, corresponding value, and an option to upload a Blob. A “Blob” is simply an attached file. Here we could upload a file that relates to the data in a particular row. Dat will store the metadata for the file and any changes to to it in the database but will not store changes to the contents of the file.

the “Inspect Row” dialog box. Here you can view a summary of the data in the row and attach a Blob (a file).

Viewing Data via the REST API

In addition to the dat-editor GUI data in Dat can also be explored through the Rest API. The api exits when the local web server is launched (dat listen) and the views can be found at localhost:6461/api. Try going to:

localhost:6461/api This page will return JSON data about the dat. This is particularly useful if you’re accessing data that isn’t yours. The API will list the:

  • the name of the dat store
  • the version
  • how many changes have been made
  • how many rows
  • the approximate size of the store

localhost:6461/api/rows This page will return a list of the rows and all of the data associated with them. This is useful if you need to access the key of a particular item or if you want to pull the JSON into another program.

localhost:6461/api/rows:key Adding a particular key onto the end of the url will return the data in that particular row.

This is just a brief overview of the REST API. More details can be found on the Dat GitHub Wiki.

Back in the command line, we can exit the local server by quitting the process with control-c

Adding More Data

Once we’ve got data into Dat we may want to add an additional row or row(s). This is easy to do in the dat CLI interface. We can add a new row by using JSON and referencing the column names and corresponding data. That might look something like this:

echo '{"name": "Dover-street Bath-house", "month": "March", "Year": 1913, "males":10150, "females": 2591}' | dat import --json

That’s useful but it might be better to import more than one line at once. We can do this by streaming in a file of line seperated JSON. The file BostonGyms.json contains some data we haven’t added. The json looks like this:

Screen Shot 2014-09-28 at 5.20.36 PM
Data formatted as Line Separated JSON

To pipe it into our dat database we can run: cat BostonGyms.json | dat import –json

The new rows will show up in our database. We can see them by running dat listen and going to the last page.

The Javascript Interface

The command line and REST API in Dat is useful and allows you to add, change, and view data quickly and easily. However, Dat also offers a JavaScript API which can be extremely useful as well. There are several excellent examples of how to use the api on the Dat GitHub repository.

Conclusions: Dat and Reproducible Research

Dat is a powerful tool for data sharing, collaboration, and version control.  We’ve only used it on our local server, but Dat provides tools for uploading data to a server and creating pipelines through which large datasets can be easily shared, updated, and managed.  For digital historians practicing reproducible research, Dat makes it easy to share data with others.  Dat is only in its Alpha release but with funding from the Sloan Foundation the tools will continue to be built out and can be extremely beneficial for researchers in all disciplines.

Reflections on the Spring Semester and Year 1 as a Digital History Fellow

It seems like just yesterday we walked into the Center for History and New Media a bit unsure about what our first year as DH fellows would entail. Looking back it has been an extremely rewarding and valuable experience. Last fall we blogged about our rotations in both the Education and Public Projects divisions. In the Spring we moved to Research for seven weeks where we worked on a programming project for THATCamp and on the PressForward project before moving onto a seminar about the history of CHNM. I want to use this blog post to reflect on the spring semester and look back at the year as a whole.

Our first stop during the spring semester was the Research division. We began our seven weeks by taking on a topic modeling project which aimed to mine all the posts from the THATCamp individual websites and blog about the process. As we used the Programming Historian to learn python (or at least attempt to), we thought a lot about tools and the scholarly research process. We discussed Zotero as a tool and the values and community behind THATCamp as a training network and community for the Digital Humanities. Although we struggled with the programming aspect of this assignment and managed to miss important concepts behind Topic Modeling, the assignment gave us some insight into what kinds of challenges and opportunities topic modeling holds. From this project I learned first hand the importance of understanding the black box behind Digital Humanities tools. After finishing with our topic modeling project we moved onto the PressForward project. We spent a week working as Editors-at-Large and helped second year fellow Amanda Morton with her Editor-in-Chief duties. Thinking about scholarly gray literature and measuring reception of scholarly works on the internet we also spent time researching AltMetrics.

At the end of the three rotations we were left with a very clear understanding of each division, its current and past projects, the audiences it creates for and the overlap between each division. We then began a seminar with Stephen Robertson that explored the history of RRCHNM. In this seminar we tried to understand how RRCHNM developed over the years into its current state and how RRCHNM fits into the larger history of the digital humanities. Beginning with an overview of what a Digital Humanities Center is and how its defined, we collaboratively looked at all 150 centers in the United States and tried to get a sense of the different models that exist and just how many actually fit the definition of a digital humanities “center” as defined by Zurich. What we realized is that the Center for History and New Media stands out from other Digital Humanities centers due to its unique attachment to the History Department but also because of the origins of the center and because of Roy Rosenzweig’s vision.

After we defined just what a center was and looked at the different models, we started to look at the origins of RRCHNM and try to create a genealogy of the different projects and trace the development of the center. Each of the first year fellows took a different major project and traced its history through grant documents and reports. I read up on Zotero in its different iterations and learned a lot about how Zotero was originally conceived as well as how it has grown, expanded, and changed since 2004.

I think one of the things that has been immensely useful for the first year fellows is the ways much of our work at the center was paralleled by our coursework. In the PhD program at GMU we’re required to take a two course sequence in digital history. The first sequence focuses on the theory of Digital History and the second is largely a web design course that introduces us to the basics of HTML and CSS. Often times the topics in Clio I related directly to why we were doing at the center and the dual exposure allowed us to see the application of things we had discussed in Clio first hand.

At the suggestion of Spencer Roberts, the fellows decided to begin a Digital History Support Space in the Fall. The support space offers “advice, guidance, and assistance for students doing digital history projects.”  Every Monday from noon to 5pm (and sometimes even on weekends) we met with students taking the Clio courses, offered advice about and brainstormed potential projects, helped to debug code, and offered a space to work where help was available if needed. We were able to draw on experience from the center and offer advice about what kinds of tools are available and where resources might be found. We weren’t experts but working with the other students in our Clio classes was equally beneficial. It left me with a better understanding of the issues, topics, and tools discussed in our classes. As many of the PhD students move onto Clio III: Programming for Historians with Lincoln Mullen this fall, I’m looking forward to continuing the Support Space.

The fellowship has been structured in such a way that each element has built on itself to provide us with experience and an understanding of digital history, digital humanities, and the debates, methodologies, and histories of the discipline. This fall I’ll be working in the Research Division on the PressForward project and helping to manage both Digital Humanities Now and the Journal of Digital Humanities. Our first year as Fellows has gone by extremely fast but I’m looking forward to beginning a new year and moving into the role of mentor to the new group of DH Fellows.

Digital History Minor Field, Part 5: Topic Modeling

This week our readings focused on Topic Modeling. I’ve been looking forward to this week for some time partly because I plan to use Topic Modeling for my dissertation but also because  I was eager to understand some of the theoretical underpinnings of the methodology.

During the Digital History Fellows in the Research Division  last spring, we were asked to do a topic modeling project on all of the blog posts from the the various THATCamps. We jumped in, attempted to learn python, managed to download every blog post from each THATCamp (well over 200), and then ran that data through MALLET for topic modeling. To make a very long story short, we managed to mess up the topic modeling and, consequently, our results for a variety of reasons. We encountered a number  of challenges but the results were skewed mostly because we didn’t understand the black box that is MALLET. However, although the project turned out to be a mess, I came to the readings this week with a bit of background on topic modeling and have gained a new perspective on the mistakes made during that project. (I think the project also has some lessons regarding failure in the digital humanities, but I’ll come back to that in another blog post.)

The articles we read this week all discussed the theoretical and mathematical underpinnings of topic modeling–specifically LDA topic modeling with MALLET. The Winter 2012 issue of the Journal of Digital Humanities is focused on “The Digital Humanities Contribution to Topic Modeling” and features a discussion of the methodology as a concept, the application of the methodology, and a critique of the tool. In “Topic Modeling and Digital Humanities” David Blei explains the mathematics behind MALLET and provides an explanation of how topics are derived and what they represent. Topic Modeling, he explains, discovers a set of recurring themes in a corpus and “the degree to which each document exhibits those topics.”  LDA, the algorithm behind MALLET, makes two crucial assumption. First, it assumes there are a fixed number of patterns of word use that occur together in documents (topics). Second, it assumes that each document in the corpus exhibits the topics, at least to some degree.1  Topics are really “a probability distribution over terms” but they look like “topics” because “terms that frequently occur together tend to be about the same subject.”[Blei, ibid] The results can be analyzed by either looking at a subset of texts based on what combination of topics they exhibit or by looking at the words of the texts themselves and restricting attention to the words within the topic. This factors out other topics from each text and focuses on the relationship between the words in the topic of interest. Blei argues that “some of the more important questions in topic modeling have to do with how we use the output of the algorithm” and the next several articles discussed this approach.

In Lisa Rhody’s article “Topic Modeling and Figurative Language” she discusses how she applied topic modeling to a corpus of poems and used the results to look at figurative language. Using topic modeling on figurative language, Rhody argues, “yields precisely the kind of results that literary scholars might hope for — models of language that, having taken form, are at the same moment at odds with the laws of their creation.”  Rody’s analysis stresses the need to combine both close and distant reading in order to interpret the results of the Topic Modeling algorithm. Her focus on figurative language necessitates “a methodology that deals with language at the level of word and document” and can be used to “identify latent patterns in poetic discourse”.

In “Words Alone: Dismantling Topic Models in the Humanities,” Benjamin Schmidt offers a valuable critique of topic modeling and warns that “simplifying topic models for humanists who will not (and should not) study the underlying algorithms creates an enormous potential for groundless — or even misleading–’insights’”. He warns that topics shouldn’t (and can’t) be studied without looking at the word counts that build them. The topics, he argues, “are messy, ambiguous, and elusive”. Through a study of some geographical data that he has topic modeled, he offers two ways to reintegrate words in topic models. First, he suggests that there is a significant issue with relying on the first words that appear in topic model to label the topic. Secondly, he shows the danger in visualizations such as plotting topic frequencies over time and assuming “topic stability across different sorts of documents.”  Schmidt’s article calls for a better understanding of the ways that topic models are created and he cautions humanists against using the topics at face value.

Ted Underwood’s piece “Theorizing the research Practices we Forgot to theorize twenty years ago”, I think, makes another important contribution to the discussion about algorithmic text analysis by discussing key-word searching. Underwood discusses the practice of using keyword searches during the research process and argues that by even choosing certain search terms we’re making a “tacit hypothesis about the literary significance of a symbol” and our findings are often deemed to be somewhat significant if we get enough results. However as he explains, this practice is much closer to data mining using bayesian algorithms than it is to “old-school” bibliographical searches. Underwood argues that “full-text search is not a finding aid analogous to the card catalog”. Search however is limited in that it only returns exactly what you asked for. Most historians probably wouldn’t immediately think of this as an issue however, as Underwood explains:

[blockquote]the deeper problem is that by sorting sources in order of relevance to your query, it also tends to filter out all the alternative theses you didn’t bring. Search is a form of data mining, but a strangely focused form that only shows you what you already know to expect.[/blockquote]

By not understanding how our search engines work, we’re potentially missing sources and skewing the results based on our search terms. Underwood continues by arguing that topic modeling and bayesian algorithms in particular can provide “reasoning about interpretation that can help us approach large collections in a more principled way.”2 Rather than the scholar trying to search a corpus by choosing keywords that describe what they are looking for, Topic Models help to remove some of the assumptions by telling the scholar what words were relevant in a particular time. For example, in my research I know the terms used to describe physical culture shift over time. Topic modeling has the potential to help alleviate some of the presumptions I would make when limiting myself to a keyword search.

What both Schmidt, Underwood, and Rhody’s articles point to is a danger in not understanding the black box. Looking back on our own topic modeling project we were completely guilty of this. We never looked at individual word counts (or even knew you could ask mallet to provide a document containing word counts) and we took the topics at face value without digging deeper. We even plotted the topics over time. Oops. I think this hits at a larger issue in the digital humanities that has come up recently in several places: DHers have worked to create tools that allow humanists to utilize computational analyses of humanistic issues however, there has been a gap in the documentation that sets a high bar for entry. Underwood’s article is an important reminder that we need to think critically about the technology we’re using to navigate through source material whether thats in a ProQuest database or in a corpus of our own materials. We’ve often accepted keyword searches without thinking twice and we should pause to understand how the “black box” works. Additionally, Schmidt’s article offers a cautionary tale and provides a valuable example of how to critically engage with and analyze the results of topic modeling.

The readings for this week’s meeting were extremely useful. I’m hoping to utilize topic modeling to look at a span of about 50 years of columns and editorials about physical culture for my dissertation and these readings provided excellent context and things to consider as I begin to play with MALLET’s outputs. We’re wrapping up the readings portion of our course and only have one more meeting before we move into the practical portion of the course in Clio Wired III (Programming for Historians). I can’t wait to learn some D3.js with the hopes of being able to manipulate and visualize topic modeled data and to build some visualization prototypes based on my research.

  1. Blei, Topic Modeling and Digital Humanities
  2. Underwood, 4

Minor Field Readings, Part 4: Space

I’m a few weeks behind with this blog post about our minor field readings meeting on Space in Digital History. For this meeting we read a variety of works that discussed Geographic Information Systems (GIS), spatial history, the possibilities and and the complexities of presenting historical material geographically using digital technology and methodologies.

GIS has been around for years now, yet historians have never embraced it like geographers have. Partly, as many of the authors point out, this is because of the software’s need for exact data rather than fuzzy or general information. Geographic coordinates must be exact rather than general which poses a problem for historians who don’t have exact data. Further, the software doesn’t handle change over time, a necessity for historians, well. In the Spatial Humanities: GIS and the Future of Humanities Scholarship Bodengier et. al. discuss the limitations that GIS has “privileged a certain way of knowing the world, one that valued authority, definition, and certainty over complexity, ambiguity, multiplicity, and contingency….” 1. This edited volume containing influential voices in the field of GIS, calls for and discusses the ways that GIS must move beyond the quantitative to the qualitative in order to appeal and be of use to historians. However, the emergence of the semantic web, Google Maps and other APIs, as well as Geolocated mobile devices have appealed more to historians looking to geolocate their work.

The works that discuss GIS put mapping and some of the larger theoretical questions about mapping into context for us, but I found the several digital history projects that we looked at and read about more useful. In Stephen Robertson’s work on Digital Harlem, one of the first digital history projects to utilize google maps, he discusses what the technology allowed him to do that traditional methods couldn’t. Spatial history allows historian to map large quantities of data onto relatively small spaces and integrate different kinds of data onto one platform allowing them to visualize the complexities and correlations on a scale not possible without digital technology.

In addition to Robertson’s article on Digital Harlem, we also read about several other digital history projects which use mapping in a slightly different way. In Mapping Texts the authors created a platform that allows users to interact with the data both quantitatively and qualitatively by combining mapping and visualization. The platform’s two views “Mapping Newspaper Quality” and “Mapping Language Patterns” allow the user to interact with digitized newspaper archives in a new way. The mapping portion of the interfaces, Torget states, “facilitates grouping, discovering, analyzing, and making sense of patterns with the dataset.” In Torgets description of the uses of the visualizations included in the Mapping Texts interface, he also taps into the potential and possibilities that mapping holds for historians. While this project is an attempt to reshape the way we interact with archives and an attempt to make searching these datasets more effective, it’s also an excellent example of the appeal of combining mapping with other types of visualization.

We also read Cameron Belvin’s article, “Space, Nation, & Triumph of Region: A View of the World from Houston”, which appeared in the latest issue of the Journal of American History. Belvins uses named entity recognition on the digitized Houston Daily Post corpus and asks “How did newspapers construct space in an age of nationalizing forces?” One of the many things I like about Blevin’s article is his clear description of his methodology and his articulation of the promise of digital history. He states:

Technology opens potentially transformative avenues for historical discovery, but without a stronger appetite for experimentation those opportunities will go unrealized. The future of the discipline rests in large part on integrating new methods with conventional ones to redefine the limits and possibilities of how we understand the past.

Blevin’s research utilizes Named Entity Recognition to analyze hundreds of millions of words in the Houston Daily Post. Through representing the places discussed within the newspaper from 1894 to 1901 he argues that the HDP “produced regionally distinctive space in an age of national forces” (126). Blevin’s article is a prime example of the ways spatial history and mapping, in particular, can be combined with visualization and used as a research technique that allows historians to study sources on a scale that is impossible without digital technology and methodologies.

It is hard to believe we’ve almost finished our minor field readings.  Next week we move on to topic modeling–a topic I’ve been looking forward to discussing all summer. As we move away from our discussion and readings about Space, I’m continually struck by the idea that we are in a moment where visualization has emerged as the way we want to interact and ingest large amounts of data.  While historians tend to be more comfortable with text, spatial history has the potential to allow for explorations of complexity, scale, and contingency in a way that other representations, and particularly narrative, can’t. However, mapping is also useful as a visualization of more complex computational analysis such as named entity recognition.  Plotting numerical or other types of data onto a map allows us to interact with the data in a form that we’re comfortable with and that is easily understood–maps. Digital Mapping can function as both a research tool that allows exploration or as a tool for representing, visualizing, and understanding data derived from computational analysis.

  1. Bodengier et. al, ix

Minor Field Readings, Part 3: Changing Theories of History

This weeks readings on “Changing Theories of History” discussed the ways digital methods are changing how historians practice or “do” history.  Several authors discussed how digital research methods change the scale of materials that we can draw upon while researching a topic.  Others discussed the possibility of using visualizations to communicate historical arguments rather than narrative.

Armitage and Guildi discuss in their article “The Return of the Longue Duree: An Anglo-American Perspective” the origins of the Longue Duree and the shift away from histories that seek to tell the grand macro-narrative to the small focused micro-histories of the 1980s.  Braudel, the original advocate of the Longue Duree, argued that histories in cycles of ten or twenty years don’t capture the “deeper regularities and continuities underlying the processes of change.”1 Beginning in the 1980s most historians moved away from approaches that focused on large time scales and felt a need to specialize because of the influx of graduate students to the profession.  As a result, micro-history flourished and represented a shift toward “generalization about the aggregate to micro-politics and the successes or failures of particular battles within the larger class struggles.”2 Since the 1980s, Historians have continued to write histories that have focused on manageable time spans, narrow topics, or individual locations.

Digital tools and methods however, allow for historians to once again approach topics from a macro, or Longue Duree, perspective.  Recent historians have already begun studying longer time periods.  Transnational, environmental, and “Big Histories” among others are representative of this shift back to the “macro”.  Armitage and Guildi make a convincing argument that this shift is because of the rise in the number of archives and tools that are available digitally.  They argue:

New tools that expand the individual historian’s ability to synthesize such large amounts of information open the door to moral impulses that already exist elsewhere in the discipline of history, impulses to examine the horizon of possible conversations about governance over the longue duree. 3

Armitage and Guildi make the argument that digital methodologies allow historians to take on a far larger scale and wider scope in their work than is practical or possible with traditional historical methods.

Lara Putnam also discusses scale, however she approaches the promise of possibilities of digital history from a slightly different perspective than Armitage and Guildi.4  Putnam argues that the discussion about the promise of digital history and new technologies is often focused on computational tools and big data algorithms.  However, Putnam points out that the number of historians actually using these tools is relatively small and we often overlook the impact that the Internet, digitized archives, books, or finding aids, and key word searching has had on the research practices of all historians. As I was reading her article, I was struck by the fact that I have no idea what it means to do historical research without the Internet.  The idea of having to write to an archive and not being able to just look at finding aids on the Internet is so foreign to me.  I grew up with the Internet and while it has developed significantly since I first started using it in the 1990s, my research practices revolve around my computer and the Internet.  My first step on a research project normally takes place in Google, which is a luxury, and a relatively new development. Putnam argues that we often take steps like these for granted without realizing how revolutionary the digitization of finding aids, keyword search, and even email has been for historical research. The process of finding secondary literature, locating archives, and searching for primary sources all take place online (at least to start) rather than in the stacks of the library.

However, Putnam also discusses hesitations and criticisms of the overwhelming amount of material available online and of keyword searching.  I think knowing a little bit about how the search algorithms behind keyword searching work is crucial as our archives continue to grow.  How can we be sure we aren’t missing anything if we don’t know how the tools we’re using work?

Putnam also argues that the availability of digitized primary sources changes the scale on which we can make historical arguments.  In the pre-digital era doing transnational research required a significant amount of resources and made it rather impractical for many, especially grad students.  However, Putnam argues that with the advent of the digital archive we should now be able to expand our historical studies not just to the national level but also to the global level.

The other theme throughout this weeks reading is how the digital is (or could) change the ways we communicate our historical arguments.  David J. Staley’s book Computers, Visualization, and History: How New Technology will Transform our Understanding of the Past explores the possibilities and implications for using visualizations as “a vehicle of scholarly thought and communication”.  5 Staley argues that visualizations are not simply illustrations that are meant to break up the text in a dense narrative.  Rather, he defines visualization as the “organization of meaningful information in a two- or three- dimensional spatial form intended to further a systematic inquiry…visualization stands on its own as the primary carrier of the information, not simply as a supplement or illustration to a written account.” 6 He defines difference between abstract and representational visualizations.  However, while he encourages thinking about ways (other than narrative) to visualize historical arguments he acknowledges that visualization is not necessarily better than narrative but is better at certain things.  He discusses the differences in thinking visually and thinking linearly (as in a narrative).  When thinking linearly the mind connects elements like in a chain versus visual thought where the mind connects elements like in a web.  Furthermore, while writing “emphasizes sequence, dimensionality and linear chains” visualization “emphasizes simultaneity, structure, and association.”  So Staley asks, why then write narrative if visualization is, at times, better at communicating historical trends and developments?  He concludes that, for the most part, its because that is what our profession accepts, its comfortable to us, and that is what is recognized (at least currently) as scholarly work.  However, he predicts that this is likely to change as digital scholarship and methodologies continue to develop and require visualization to communicate the complex arguments they will produce.

Staley isn’t the first author we’ve read that has questioned narrative.  Hayden White has written several pieces about the construction of narrative and has argued that historians place far too much value on the idea of narrative without thinking critically about it.  White asserts “basic to modern discussions of both history & fiction, presuppose a notion of reality in which ‘the true’ is identified with ‘the real’ only insofar as it can be shown to possess the character of narrativity”. 7  Every event, White argues, is susceptible to at least two narrations of its occurrence.  The fact that a document even exists is because it has already passed through the narrative of whoever recorded it.  White asserts that the value historians place on narrative is a constructed communication of morality. As digital methodologies continued to develop and are more widely recognized historians will have to recognize alternatives to narrative as scholarship.

So what does all this mean for my work and my eventual dissertation?  Stephen asked us each a few weeks ago, “Why a digital dissertation?”  It occurred to me I don’t have a fantastic answer for that yet.  I know that I want to approach Physical Culture on a larger scale and from a different perspective than other historians have.  Rather than looking at one person, one city, or one small time period I want to try to approach from a national level with a larger timescale (probably 50ish years).  However, I also have to be careful to not bit off too large of chunk (I’d like to graduate relatively soon) and will eventually have to draw the line somewhere.  That means I’ll probably have to neglect looking at physical education for college students or take a more narrow approach to that piece.  Because not all of my sources are digitized—I’ll have to be strategic about what I include.  As I was doing the reading for this week I was also in the midst of planning a research trip to New York and have been thinking a lot about the scale of my dissertation. Putnam’s piece prompted me to wonder about the international connections (I know there are some) and what it would look like to connect the dots and show the transnational exchange of ideas about Physical Culture. What would it look like to visualize this exchange of ideas and could it be communicated visually?

As seems to be the theme with this readings course, I don’t have any answers yet but these are things that are definitely at the back of my mind as I continue to do some research this summer.  These readings have given me lots of things to ponder as I continue to craft my approach to looking at women’s physical culture and I think they’ll be very useful as I defend and frame my methods as well as the inevitable “Why digital” question.

  1. David Armitage and Jo Guildi, “The Return of the Longue Duree: An Anglo American Perspective”
  2. Armitage and Guildi, 16
  3. Armitage and Guildi, 38
  4. Lara Putnam, “The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast”
  5. Staley, ix
  6. Staley, 36
  7. White, The Value of Narrativity, 10

Minor Field Readings, Part 2: Perspectives on New Media

This weeks readings on “Perspectives of New Media” discussed from varying viewpoints how new media is and has changed the ways scholarship is produced and conceptualized. The readings contemplate what new media tools and technologies change for how we think about, produce, and evaluate information.  Out of these readings came a consistent concern with scale, linearity, and how digital technologies and tools will affect narrative, or for my purposes, a historical argument.  The assumption has long been that reputable scholarship, at least for historians, is written, linear, and is composed in a narrative.  That’s what our profession values.  However, this is (hopefully) beginning to change due to new media and how it has fundamentally changed “how we think.”

As I worked through the reading for this week I was continually thinking about what a digital dissertation on physical culture (my research interest) would look like.  How do you structure and develop an argument on the Internet and can you make a historical argument in a format that isn’t traditional written (and linear) narrative?  What does a digital dissertation need to do and take into account, especially if we expect that it’ll be recognized as scholarship by our peers? I don’t have an answer for this and I’m still working on conceptualizing the scope and direction of my dissertation but I think several things emerge from these readings that are relevant and useful as I continue to think this through.  I just want to briefly outline two points that I’ve been thinking about in relation to narrative and new digital technologies.

Landow’s Hyper/Text/Theory provided useful insight into how scholars at the time, mostly scholars of literature, were concerned with the ways in which hypertext changed authorship, narrative structure, participation, linearity, and interaction with a text. Rather than having to start at “point a” and end at “point b” in a fashion required by linear narrative, hypertext allows users to navigate through the text with no defined path.  Hypertext is composed almost spatially with each page being linked to another in a web-like fashion. However, more interesting to me than the ways hypertext changes interaction is the scale that it potentially allows.

Since the publication of Hyper/Text/Theory databases have become the standard technology behind most of the web.  The massive size and the non-linear format of databases present new challenges to how we interpret and evaluate material on the web.  Landow discusses how hypertext allows for scale how it has threatened to change the ways scholars (or critics) relate to text.  In a cybertext, Landow explains, “critics can never read all the text and then represent themselves as masters of the text as do critics in print text…Large hypertexts and cybertexts simply offer too many lexias for critics to ever read.  Quantity removes mastery and authority, for one can only sample, not master, a text.”1  A similar fear exists,I think, in relation to databases.

In How We Think Katherine Hayles also discusses scale and argues that it “changes not only the quantities of texts that can be interrogated but also the contexts and contents of the questions”.  The presence of databases on the web and the increasing amount of data stored in them allows a database to constantly expand and to tell multiple stories with no one answer.  While databases allow for an exponential scale that often makes scholars uncomfortable databases also threaten the form that underpins our scholarship, narrative.

Manovich has called narrative and databases “natural enemies” claiming that the database’s unstructured list of items is at odds with the narrative’s focus on “cause-and-effect” trajectory. Narrative and databases are each “competing for the same territory of human culture, each claims an exclusive right to make meaning out of the world.”2 However as Hayles discusses, the flexibility of the database allows it to grow and expand without limitation.  Databases, she argues, “tend toward inclusivity, narratives toward selectivity”.  In contrast to Manovich’s statement, Hayles argues that the while the narrative might not be able to tell the story in the age of big data, databases facilitate the proliferation of narratives or different perspectives based on the data.

I think Liu’s article “When was Linearity” also makes an important contribution to the discussion of how new media transforms scholarship and the ways scholars compose their arguments.  Liu asks “what is the relationship of linear (written) to graphical (digital) knowledge today?  Which is freer?”.  Liu skillfully shows that linearity is a historical phenomenon that never really existed.  Instead he argues, linearity “was always only  a critical way or ideology of thinking about what was.”  He suggests that perhaps in the world of Web 2.0 we need to consider that linearity is simply being reconfigured in the modern age of information technology and is reappearing in the form of graphical hypermedia.  He argues that

neither in the past nor now is graphical knowledge the opposite of linear, discursive knowledge. …The graphical is a methodological, critical, and ideological reflection upon the linear, and vice versa. Graphical and linear are each other’s self-consciousness.

Liu, I think, makes an important point about the role of narrative for historians. His discussion of narrative makes us question why it is necessary to turn history into a narrative and why it can’t take another form.

What does this all mean for a digital dissertation?  I think it provides some groundwork and a theoretical basis for thinking about how to structure narrative and make a historical argument on the Internet.  I don’t really have very many answers as to what this means for my digital dissertation yet.  However, I do think the digital allows for the representation of different perspectives and simultaneity in ways that narrative can’t.  The digital allows the construction of an argument that maybe isn’t formatted in a way that is familiar and safe for historians, but I think a multi-linear construction of a historical argument allows us to show data-driven, interactive, and complicated arguments.  Rather than focusing on just one area, I can use scale to more effectively communicate the range of experiences and the varying shades of those experiences.  I don’t know what that looks like but I think its an interesting possibility and one that will develop further as the born-digital generation of historians who already think through and alongside digital technology further develop their careers.

  1. Landow, 35.
  2. Manovich

Minor Field Readings, Part 1: Digital Humanities

This summer I am completing a readings course for my minor field in Digital History.  This weeks readings have discussed the digital humanities, the history of the field, and have offered critiques as well as predictions about where the field is going.  Most volumes about Digital Humanities discuss the history of the field and place its origins in the history of humanities computing.  While there is certainly truth in these accounts, they often overlook the histories of disciplines such as history and how these fields merged together to form the “Digital Humanities” around 2004.  This week’s readings have coincided with the recent conversations about the digital humanities by  by Tom Schienfelt and Stephen Robertson. The current discussion about the fragmentation of the Digital Humanities into disciplines makes reading about the debates, critiques, and trends in the Digital Humanities even more interesting.  While history is most certainly part of the digital humanities conversation (and I’m not arguing that it shouldn’t be) I think it is useful to look at the issues in the field from the perspective of an aspiring digital historian and to discuss how they might be relevant to those of us looking to create digital dissertations and those of us preparing to enter the job market in the next six years.


The Digital Humanities community spends a lot of time working to define just what exactly the digital humanities are.  Many of the edited works such as Matthew Gold’s Debates in Digital Humanities, Berry’s Understanding Digital Humanities, and Burdich’s co-authored volume Digital_Humanities tell a similar story of the history of digital humanities.  The origins of humanities computing go as far back as the 1940s (to Father Busi) but humanities computing accelerated during the 1980s and 1990s when scholars began to utilize the power of computing for humanistic inquiry.  Scholars of this “first wave” used quantitative measures to study traditional humanities topics.  Their projects tended to harness the power of the database to perform quantitative analysis to analyze text and tended to create focus on large-scale textual encoding and digitization projects.  This first wave however, originated and was, arguably, a trend coming out of English and Literature departments.

Meanwhile, in the field of history a very different trend emerged alongside the World Wide Web.  Originating with the American Social History Project and Roy Rosenzweig’s vision, digital history began with efforts to digitize and make available primary sources and teaching material, primarily for K-12.  Roy’s creation, the Center for History and New Media, was founded in 1994 and in the 90s was largely focused on the educational and public uses of the internet.  By 2001, CHNM moved into collecting histories online (through the 9/11 digital archive) and by 2003 began developing tools for researchers.  Developing alongside the internet and the increasing social nature of the web tools such as ECHO, Zotero, and later Omeka emerged.  As Tom Schienfeldt points out the development of digital history developed “as a natural outgrowth of longstanding public and cultural historical activities rather than a belated inheritance of the quantitative history experiments of the 1960s and 1970s.”   Digital history, in comparison to the dominant narrative outlined in Berry, Brundick, and Gold was rooted in “oral history, folklore studies, radical history and public history”.1

While these two strains of digital humanities may have developed and originated in different contexts, the mid 2000s brought circumstances that would unite the two under the “big tent” of what became known as “digital humanities”.  As Kirschenbaum has described, the term “digital humanities” appeared in 2004 in the publication of A Companion to the Digital Humanities, which argued for the replacement of the term “humanities computing” with “digital humanities” allowing for a broader group of disciplines to participate.  In the next few years the term was used in the formation of the Alliance of Digital Humanities Organizations and, perhaps most significantly, the establishment of the Office of Digital Humanities at the NEH.  The appearance of the term “digital humanities” represented several different branches of disciplinary digital work merging together underneath the “big tent”.


It has been argued over an over again that the Digital Humanities are the “next big thing”.2   And while I agree with that, I think it’s taken longer than expected for it to be adopted, especially in history departments and I think that has a lot to do with the inward discussion in the Digital Humanities.  Andrew Prescott discusses the range of reasons that digital humanities have not been widely adopted in academic departments and comes to the “less comforting conclusion” that the reason dh hasn’t been adopted is due to its “collective failure to produce scholarship of outstanding importance and significance”.3  Often times, it seems, Digital Humanists are so busy defining, debating, and justifying digital humanities that they struggle to answer the big “So what?” question and compel their more traditional colleagues to care.  However, some dh’ers would disagree and would argue that the digital humanities is currently focused on tool building and that it just hasn’t developed enough yet to make sophisticated arguments. The debate over arguments, tools, and the results of dh work continue to define the major issues within the field.

Among the debates that are prominent in the field are discussions of what the digital humanities is and what is considered DH work.  Stephen Ramsay’s 2011 talk at the MLA entitled “Who’s In and Who’s Out?” argued that “if you are not making anything, you are not — in my less-than-three-minute opinion — a digital humanist.”   He continued by stating “but if you aren’t building, you are not engaged in the “methodologization” of the humanities, which, to me, is the hallmark of the discipline…”   Whether or not you have to be able to code in order to participate in dh is a hotly contested issue.  Ramsay clarified his position later in another blog post ,“On Building”, and claimed that knowing how to program an code at that level applied to a very small subsect of the dh community.  Instead, he clarified that building doesn’t necessarily require programming knowledge but instead a move from reading and critiquing to building and making—which in his opinion defines the digital humanities.  He explained:

All the technai of Digital Humanities — data mining, xml encoding, text analysis, gis, Web design, visualization, programming, tool design, database design, etc — involve building; only a few of them require programming, per se. Only a radical subset of the dh community knows how to code; nearly all are engaged in building something.

This conversation about what the basic required skills for digital humanists are, touches on the idea of tool building and whether or not the digital humanities have to make arguments.

Tom Schienfeldt has argued that the digital humanities do have to ask and answer questions—these are values that are at the core of what the humanities are all about.  However, there haven’t been very many questions answered as of yet, and there hasn’t been any significant interventions into the field because of digital methodologies.  Schienfeldt argues, however, that the better question is “When does digital humanities have to produce new arguments?”  First, the tools must be developed and properly honed in order for scholars to make new arguments and interpretations possible.  Schienfeldt argues that “At the very least, we need to make room for both kinds of digital humanities, the kind that seeks to make arguments and answer questions now and the kind that builds tools and resources with questions in mind, but only in the back of its mind and only for later”.  The lack of questions being answered because of digital humanities tools has led to a large amount of skepticism in the academic world about the usefulness of DH and this remains one of the core debates that surrounds any DH work.

I would argue that the Digital Humanities do have to make arguments.  However, I would also argue that we are at a point where a sufficient number of tools have been developed in order to aid digital humanistic inquiry and we should be beginning to see results.  However, I think one reason we have not yet seen the types of results and arguments that we all think digital tools can help us to accomplish, is because of the lack of training (for both grad students and mid-career academics).  The number of institutions that train their graduate students to use digital methodologies in their research can be counted on two hands.  Graduate students in the humanities are just beginning to utilize digital techniques and tools in their research and they are significantly limited depending on the amount of support at their institution.  Lincoln Mullen and Cameron Blevins are arguably the first to use digital methodologies in their dissertation research and I think in the next 3-7 years we’ll see the emergence of digital dissertations that rely on digital tools and methodologies and make significant contributions by answering questions.  But like everything in academia—it takes time and change is long slow process.


Among all the debates about what DH is and what should be considered DH, most still believe that the application of digital technology and methodologies will allow scholars to answer questions and come to conclusions that wouldn’t be possible through traditional research methods.  The promise of DH and the reason its been touted as “revolutionary” lies in its ability to take “traditional problems in the humanities and propose not just how computational methods and analytics can be used to investigate the area, but also how the findings derived from the method offer new insights and results that traditional methods cannot”.  At a fundamental level the goals of all digital humanists are the same whether or not they differ slightly depending on disciplinary allegiances.   But if dh holds as much promise as most digital humanists claim—why hasn’t it been more widely adopted?  As I pointed out earlier, I think the answer lies largely in the lack of training for both graduate students and academics.  I think this is where the disciplinary fragmentation of dh is not necessarily a bad thing.  Stephen Robertson recently pointed out on twitter that “Both D & H worry many, thus it can be easier to promote exploration of D if begin on their own disciplinary ground”.  It can be easier to explain the uses and the benefits of dh to those who aren’t necessarily on board or don’t understand it’s promise by demonstrating how it can be useful to their work.  I think we won’t see a widespread acceptance of DH in traditional academic departments for quite some time and surely not until we can demonstrate through published and influential scholarship the conclusions that can be drawn and the results that can be reached only through the use of digital tools and methodologies.


  1. Tom Schienflet, “The Dividends of Difference: Recognizing Digital Humanities’ Diverse Family Tree/s”
  2. Prescott, Consumers, creators, or commentators?  Problems of audience and mission in the digital humanities
  3. Ibid