Pre-Processing Text for MALLET

Categories: Digital History Fellowship Hacking THAT Camp's Yack
This post is the third in a set of 5 written by the Digital History Fellows at the Roy Rosenzweig Center for History and New Media. The original post can be found on the Digital History Fellowship Blog. In our previous post, we described the process of writing a python script that pulled from the THATCamp MySQL Database. In this post, we will continue with this project and work to clean up the data we’ve collected and prepare it for some analysis.

Read More →

Extracting Data from the THATCamp Database Using Python and MySQL

Categories: Coursework Digital History Fellowship Hacking THAT Camp's Yack
This post is the second in a set of 4 written by the Digital History Fellows at the Roy Rosenzweig Center for History and New Media. The original post can be found on the Digital History Fellowship Blog. This week we’ve continued to work on building a python script that will extract all of the blog posts from the various THATCamp websites. As Jannelle described last week, our goal was to write a script that downloads the blog posts in plain text form and strips all of the html tags, stopwords, and punctuation so that we can feed it into MALLET for topic modeling and text analysis.

Read More →

Spring Semester in Research and a THATCamp Challenge

Categories: Coursework Digital History Fellowship Hacking THAT Camp's Yack
This post is the first in a set of 4 written by the Digital History Fellows at the Roy Rosenzweig Center for History and New Media. The original post by Jannelle Legg can be found on the Digital History Fellowship Blog. The spring semester is here and the first year DH fellows have begun our rotation into the Research division of CHNM. To get the ball rolling, we spent a week working through the helpful tutorials at theProgramming Historian.

Read More →