TumblrScraper – Scraping Social Media

Resource Location: https://github.com/kristopolous/tumblr-scraper

Created by Chris McKenzie, James Wilkinson, and James Scott-Brown, TumblrScraper is another open-source software. Originally created to mine Tumblr for meta-data to create a recommendation engine based on the data. The major downside here is that it requires Ruby on Rails (a free software development protocol) to run.

Usability
If you are familiar with Ruby on Rails, and already have that knowledge set, I think that this software might be perfect, as it has all the features a researcher/archivist could want. However, I don’t know Ruby on Rails.

After spending two or three days trying to install Ruby onto my computer through various tutorials, I tried an online Ruby on Rails environment, just to see if I could get TumblrScraper to load. After almost a week attempting to install the tool, I gave up. If your team has a coder, or you have the time to learn Ruby, this could be one of the best products out there.

Content Quality
Since I never got the TumblrScraper to work, I can’t state with any kind of certainty whether the tool’s output is as reported, but in theory, this tool does it all (almost). It scrapes all visuals (graphs, photos, and videos); all the text for each post; and all notes (likes, reblogs). It also creates a list of top fans and an inverse map of users to blogs. The one major missing piece of meta-data is tagging. Tags are often used to create additional narrative and commentary on Tumblr, and that is data that I would want.

Affordability
Theoretically, this is free. There is no cost for either Ruby on Rails (the base software) or TumblrScraper (the mining tool itself). But if you don’t have the coding skills, then you most likely will need to hire someone, which could cost quite a bit. Sometimes free, isn’t really free.

Portability
It looks like the output files are JSON (Java Script) and are relatively easy to incorporate into a MySQL database management system.