Big Data for Big Problems

Hello all! This week, of courses, is none other than the crunchtime before finals week, so I have been really busy finishing some school projects and studying for finals since Thanksgiving. Which by the way, I encourage you to leave a comment about how your Thanksgiving went! Yours truly roasted a bird this year and it didn’t turn out that bad!

Le Roast

Quite good, I somewhat hit the nail on the head this year.

But turkeys aside, this week has been a wild start. Between school projects and work, I never had to deal with database systems ever before. The DALN does work on a server backend which you can view here on Github, but the frontend gets the information through a REST API, which you can see how I used in my last blog post.

This week I went to a big data workshop named Introduction to Data Intensive Computing Environment (DICE). Somewhat a scary name, not going to lie, but when I got to the workshop I was pleasantly surprised about the way they taught these complex tools. And by they I mean Semir Sarajlic, a Research Computing Specialist from Research Solutions and Suranga Edirisinghe, the High Performance Computing Facilitator here at GSU. They made it easy to understand what exactly it meant on the technical side of big data, a term that has been in the background in mine and many other beginning Computer Science majors for years.

So DICE essentially is a collection of several different technologies that all students can use to work on big data projects on their own time, some of which you can see in the graphic below.

Creative Commons. From: ” Reference Architecture and Classification of Technologies, Products and Services for Big Data ” by Pekka Pääkkönen and Daniel Pakkala

We’re talking huge computing factor, the specs on the individual data node computers are top of the line. You can view how it works on the user interface side as well as any specs here. At the workshop, we browsed through a couple of tutorials dealing with Twitter and Eventbrite data, which were kind of like the simple REST API calls I made with the DALN. However, we were shown how in like a tool Apache Hadoop would be able to divide multiple data sources like REST APIs to other data nodes and do separate processes on them. For example, lets say you wanted to get Twitter Data from two different geolocations and map them in relation to each other. With Hadoop, you can easily manage this workload across data nodes and applications (like ESRI ArcGIS) in parallel. Hadoop has a lot of technologies, but the one we used was HDFS (Hadoop Distributed File System) where you can run commands to put data somewhere and do functions on it as well as even executing OS-level commands such as moving directories or giving permissions for other DICE users to use.

So I might’ve gone a little bit over in explaining how DICE works, but you don’t need to learn from me. The DICE is already filled with tutorials on how to use the computing cluster for projects and even teachings about big data projects from the beginner programmers’ perspective in series of Python workbooks that look like this:


See each one of the blocks? All you have to do is press Shift-Enter to run the commands in real-time. Very easy to learn with once you get the hang of it. And there are dozens of them, so if you want to do something else, there are plenty of tutorials to learn from.

I hope I brought some worthwhile information. DICE is a huge thing and paid by the Student Tech Fee, so I hope that all students can learn from this free resource. Stay tuned for updates on the DICE, as I’m hoping to use it this weekend at HoloHack as well as other projects. If you have any questions, feel free to post a comment below or

Join the DICE slack:

Request a DICE account:

Leave a Reply

Your email address will not be published. Required fields are marked *