Resource Location: http://www.jzab.de/content/tumblthree
TumblThree is as its name indicates, is the third iteration of a free, open-sourced scraping software designed specifically to mine Tumblr for information. The current version of the software is created by a programmer with the username “Zab” from Munich, Germany. It is only available for Windows, but if you are a Mac user it will function on an emulator. Zab is pretty responsive to questions and addressing bugs, and appears to be actively working on improvements, as of September 2016.
The software was easy to install, and the programmer has provided plain English directions on using the software at the download site. Interface wise, TumblThree is fairly intuitive, you can add blogs by copying and pasting the URL into the bottom menu bar, and clicking the “add blog” button. You can run “crawls” on multiple blogs simultaneously, and control the number of simultaneous downloads, through the settings, so as to not bog down your computer’s functionality.
On a purely functional level, make sure you never delete the index files that are created, as they are the record of what has been previously downloaded. Without them, each crawl will download every image, instead of only the new images posted since the previous crawl.
Perhaps the most useful and interesting function of this tool is the ability to scrape images with specific “tags.” So it is possible to create a “crawl” that would compare images associated with the same tag, across a variety of blogs.
Upon working with the software, I sadly discovered that its primary scraping capability is for images, and does not bring with it any additional meta-data like post text, date, reblogs, or replies. At the end of the day, you will only know which blog the image came from, and the tag associated with the image (if set in the crawl search parameters). Fortunately, the system does NOT rename the images, so you can link the images back to their meta-data at a later date.
Can’t beat free. TumblThree is open-source and can be downloaded for free, though, if you are inclined there is an option to donate to the programmer. (He’d love for your to buy him a beer!) The program itself doesn’t require a lot of space to run, but if you are scraping a lot of blogs, you could find that the images begin to take up a lot of space on your computer and an external hard-drive may become necessary.
The export of TumblThree is essentially raw data, in universal file formats (.jpg, .gif, .png), and can be transferred pretty much anywhere. It essentially automates the process of manually saving each individual image on a blog.