THATCamp Topic Modeling Results

Categories: Digital History Fellowship Hacking THAT Camp's Yack
This post is the last in a set of 5 written by the Digital History Fellows at the Roy Rosenzweig Center for History and New Media. The original post, which was collectively written by the DH Fellows, can be found on our Digital History Fellowship Blog. We have spent the last few weeks working to build a python script that would allow us to download and prep all of the THATCamp blog posts for topic modeling in MALLET (for those catching up, we detailed this process in a series of previous posts).

Read More →

Unexpected Challenges Result in Important and Informative Discussions: a transparent discussion about stripping content and stopwords

Categories: Digital History Fellowship Hacking THAT Camp's Yack
This post is the fourth in a set of 5 written by the Digital History Fellows at the Roy Rosenzweig Center for History and New Media. The original post by Jannelle Legg can be found on the Digital History Fellowship Blog. As described in previous posts, the first year Digital Fellows at CHNM have been working on a project under the Research division that involves collecting, cleaning, and analyzing data from a corpus of THATCamp content.

Read More →

Pre-Processing Text for MALLET

Categories: Digital History Fellowship Hacking THAT Camp's Yack
This post is the third in a set of 5 written by the Digital History Fellows at the Roy Rosenzweig Center for History and New Media. The original post can be found on the Digital History Fellowship Blog. In our previous post, we described the process of writing a python script that pulled from the THATCamp MySQL Database. In this post, we will continue with this project and work to clean up the data we’ve collected and prepare it for some analysis.

Read More →