This website analyzing the various “scraping” tools for academic archival of social media feeds is at its heart informal usability testing. The goal of any usability testing is to “determine to what extent the product or service as designed is usable–whether it allows users to perform the given tasks to a predetermined standard–and hopefully to uncover any serious, resolvable issues along the way” (Hall 47). In this case, I have outlined four criteria that I am examining: interface usability; quality of content output; affordability; and the portability of data. The criteria were selected based on my needs as an academic researcher, though I imagine that many academics share similar needs.
This research is dominantly descriptive and exploratory (Hall 13-14), as much of my time became invested in figuring out what options were available currently for scraping and even what the correct or useful terminology for what I intended to accomplish (an offline archive of specific social media feeds). The final output of the project is not just an archive of content, but a description and explanation of the options available–their highlights and pitfalls.
As a lone tester, this is a limited analysis and is hardly comprehensive testing. There are a variety of biases implicit in this informal analysis, including the obvious sampling and social desirability biases (Hall 28-29). Despite these biases, this analysis does provide some insights into the development pitfalls of some of these tools, and perhaps will be useful to other academics looking to gather digital artifacts for archival purposes, especially as so much data is ephemeral and trapped in proprietary software, even as the information is public.
Insights and Overview
The design of most of the scraping tools use an activity-centered design; that is the tools are “designed only through an understanding of the activities they were meant to support” (Hoekman 40). Therefore, most of the tools, including TumbleThree, Tumblr2Wordpress, and MyTwitterScraper are rather clunky to use at best and for some like Tumblr Scraper and Social Media Combine, completely unapproachable without an extensive knowledge of programming languages. However, there was one tool that used a goal-directed design that is more user-friendly, and it is clear that the designers had a better understanding of user goals such that the interactive systems were designed to help them meet those goals (Hoekman 37). Particularly, the tool Import.io was very user-friendly with an interactive design, with built-in tutorial information. However, this was also the only commercial product of the group, and it is clear that cost issues are a consideration for the average academic.
I chose a website as my platform over a paper or brochure for the affordances of multimodality, in particular, the ability to incorporate hyperlinks, images, graphic design, and allow for audience accessibility. Multimodal or new media texts are those “that have been made by composers who are aware of the range of materialities of texts and who then highlight the materiality: such composers design texts that help readers/consumers/viewers stay alert to how any text – like its composers and readers—doesn’t function independently of how it is made and in what contexts” (Wysocki 15). Perhaps it is fitting (even metatextual) that this multimodal web site is devoted to scraping new media’s Twitter and Tumblr social media feeds.
The written word provides the majority of the site’s content. However, some significant elements are image and designed based. For example, the icon set created by Cristian Lungo creates a visual identifier for each criterion highlighted in the analysis that is consistent across the site. Additionally, two tool comparison charts (one for Tumblr and the other Twitter) provide users with a quick-read overview of the tools. Additionally, images of the software’s interface and design have been embedded when available, to provide a visual overview alongside the textual. Finally, the use of hyperlinks provides supplementary information, specifically definitions, examples, and additional materials like tutorials.
With the use of visual elements like text, image, and hyperlink the site takes seriously issues of accessibility. Particularly, as my experience with accessibility in teaching has always been “just in time” or “retrofitting” I wanted to make sure that accessibility of the site was at the forefront of the design (Price, Kerschbaum). So, when selecting the template theme for the WordPress site, I narrowed the choices down to those that highlighted parity between PC and mobile platforms to allow users who use a tablet for primary communication access. I also wanted a theme that had image accessibility fields built into the interface design (alternative text, caption text, and description text) as these can sometimes be difficult to code, and I preferred a design that had that in mind from the start. Finally, I went through and learned about the WordPress accessibility settings, to enable features like skip links, long description UI, and the accessibility toolbar. Unfortunately, I found that the GSU installation of WordPress does not include the Image Capture Plugin, so the non-visually impaired cannot view the captions and descriptions, but I am assured that those using screen readers will have access to this content.
Archiving Laura Hollis’s Social Media
The final component of this project was to provide a sample of the scraped archive. To create the example included on the site of Laura Hollis (main protagonist of the Carmilla series), I used a combination of tools including Tumblr2Wordpress, TumblThree, and Import.io.
I had to create a second WordPress site and a “dummy” user for Laura, so the blog could correctly identify her posts. I then used Tumblr2Wordpress to export the structure and textual content of Laura2theLetter’s Tumblr blog as an XML file that I then imported into the WordPress site. Once the backed-up content is linked correctly, you need to re-link all the images to those that you’ve scraped using TumblThree. Fortunately, TumblThree doesn’t rename the files so that you can use a universal folder redirect and the images will then link back up. Finally, I used Import.io to scrape the Twitter feed for Laura Hollis. However, visually this file is an unremarkable Excel spreadsheet, and so instead of just reposting that file, I included a window to Laura’s Twitter feed in the sidebar. (Below that feed is also a link to the Excel spreadsheet.) Now, all of Laura’s social media content is available apart from the official and ephemeral feeds and located in one place.
Eventually, I would like to continue recreating the character Tumblr blogs on my WordPress archive site, creating a repository that is as visually similar to the original as I can muster. Additionally, I would like to set-up a database system, perhaps in MySQL so as to be better able to play with the data available, particularly the “notes” (Tumblr), “retweets,” and “likes” (Twitter) as this data is scraped but not accessible in the WordPress duplicate.
As for this site, there are a few tools that I want to keep my eyes on, as I think that they could be useful if the tool development continues and becomes more streamlined and user-friendly. Notably, TumblrScraper and Social Media Combine look like they could be useful open source tools, but require coding skills I just don’t currently have.
Kerschbaum, Stephanie. “Modality.” Kairos, Vol 18, Issue 1. http://kairos.technorhetoric.net/18.1/coverweb/yergeau-et-al/pages/mod/index.html
Price, Margaret. “Space/Presence.” Kairos, Vol 18, Issue 1. http://kairos.technorhetoric.net/18.1/coverweb/yergeau-et-al/pages/space/index.html