This project began as a “hail mary” to archive social media content from a transmedia storytelling object that I am looking at for my dissertation on collaborative imaginary worlds as queer utopia. The object in question is a Canadian “web series,” Carmilla, that uses in-character social media feeds to interact and collaborate with audiences in the story world of Silas University. At the start of the semester, the Carmilla production team announced that this would be the final “season” of the series, making the loss of these feeds a distinct and immediate possibility.
With this concern in mind, I’ve been tackling the technical and procedural issues related to “scraping” social media content to offline repositories, integrating disparate digital collections into a single database, and attempting to design an interface to interact with the collected data. However, the scope of the social media archiving project for this site is much smaller–here I am attempting to provide insight into the various scraping tools available for other researchers who may be encountering similar issues.
I am focusing on six different tools that I investigated during this process. Since my object of study uses Twitter and Tumblr for their social media interactions with audiences, I focused on scraping tools that could mine information from these two services. In evaluating these applications I hope to address questions in four major areas:
How easy is it to use the tool/service? How steep is the learning curve? Is there specialty knowledge or skills needed to leverage the scraping tool? Can a layperson use this tool to get useful data?
(2) Content Quality
What information does the scraping tool provide in its output? How detailed is the meta-data provided? Does it include images? Does it provide interface specific information? Does it include comments or interactions between the account and other users?
Is there a fee? If so, is it one-time, monthly, yearly? Are there “light” or promotional versions of the application available? Does the software require a dedicated computer? A lot of memory space or other specialty equipment?
Is the data easily portable to other systems outside of the scraping application? What kind of export files does the scraper support? Are those files easily converted or accessible to database management systems?