URL redirection in Archival Projects

A lot of the websites that we work with here at SIF are updates for archival websites that need new features that couldn’t be done with the original technology. While most of the features can be rehashed into any framework like Omeka, WordPress, or Angular, there are a couple of items that a website will use that aren’t exactly backwards compatible very easily. One of these items is the url a website.

Today, the most emphasis on urls are SEO capability–basically how bots rank a website and give it higher priority in search results. Having a nicely formatted, human readable URL is more important than any fancy text styling on a webpage, since the user will be directly manipulating interfaces like buttons to get to different places in the website changing the URL as they go. Therefore, redirecting URLs should be done in a process to both safely transfer a person to the correct webpage they want to look at and to do so in a way approvable by a bot looking at the transition.

The examples I want to look at will be hopefully less technical and more about the high-level process so anyone can understand the gist of the process. If you want a more technical view of the process, here’s a Mozilla post detailing the process: https://moz.com/learn/seo/redirection

So let’s take a look at an old URL that used to be a part of daln.osu.edu.

daln.osu.edu/handle/2374.DALN/7099

Seems alright for a url. Doesn’t have any weird symbols like percentages or hashtags. Not even that long of a url, which is very good. Can a human interpret it? Probably not since the words of handle and 2374.DALN don’t translate to anything. 7099 could be a couple things like the 7099th post to the DALN or maybe it’s just an arbitrary id. A bot would be harsh on this URL.

The redirection process can be done in several ways, but any redirection will need a server with instructions to redirect any link from the old site to the new site using the old site’s website domain (aka www.daln.osu.edu).

The way the DALN does this is any link to daln.osu.edu has a simple webpage saying the website has been redirected and if the person clicked on a url like the one above, we will look for post 7099’s new post id and then send them onto the relevant detail page. That way the url gets transformed like so,

daln.osu.edu/handle/2374.DALN/7099 -> thedaln.org/#/detail/0e231f40-6149-4af1-b922-16c8818e25cc

And now the the user is on the new webpage! Nice. However, thedaln.org/#/detail/0e231f40-6149-4af1-b922-16c8818e25cc isn’t exactly human readable either. In fact, it might be even worse than the original url to a bot. In the near future, I’d like to change the url into a more readable format with the title instead like:

thedaln.org/detail/my-word

Waaaay better–for bots and humans. All posts on the DALN need a title and an id, so putting the title would be the best idea. However, post ids on the DALN are unique, but titles could be the same. Two posts having the same title means the above url might link to two posts, which is not good. A github issue on the DALN (https://github.com/gastate/dalnfrontend/issues/9) github depicts SEO tasks, and I’ve made the topic of this blog post into one.

SEO, or search engine optimization, is a very important concept to think about for anyone making even the most simplest website. It will define the amount of traffic to a website, how search engines will rank a website in search results order, and a lot more effects. There are a lot of factors of SEO, but that’ll be a blog post for another day.

For archival sites, it is also important to retain reliability across records so that even if the records themselves are deleted (maybe the website is deleted) then there is a url that said the record existed.

The DALN manages this with a lot of publications using the handle.net (http://handle.net/) services. Typically the links look exactly like the original url.

hdl.handle.net/2374.DALN/7099

These are also redirected in the same manner. How handle.net works is by hooking up a proxy server to the original url. All this server does is maintain the relationship of the handle urls to the original urls via a database. It’s the same as navigating from the original url.

You might’ve heard about url shortening as well, maybe even used services like bit.ly or goo.gl. They work pretty much in the same way. However, archival websites favor reliability instead of usability, so the handle.net service is more reliable since a dedicated organization for archival research.

I hope you’ve learned a little bit more of how this tangle of servers we call the Internet works with urls. They are the roadways in which we travel and making sure to get to the destination safely should be a primary focus of every person making a website. I hope to go more into web development technologies at a non-technical standpoint, so be sure to stick around for my next blog post.

Privacy and Information with CyberSpace

TL;DR

Always look up information about yourself on the Internet. Do so frequently.
Secure your information by mitigating information present on the web, a good start is this link
Public release of personal information is sometimes required, but it should protect the people first.
Current policies to protect individuals are too lax and are protecting organizations more than individuals.

Background Info and Spokeo Research

Being a computer science student interested in social engineering, infosec, phreaking, and other such fields, this assignment was just like a regular security check to “hack myself” I suppose. People should check their footprint all the time on the internet for additional information since it can affect their lives, jobs, and their loved ones. Events such as the Equifax hack has made it apparent authorities even at the bureaucratic level cannot keep your information safe (and it wasn’t even a complex hack at all, it was a username:admin password:password situation).

I conducted a search on Spokeo and found myself on the site as well as the rest of my family in the same household. Some of the information there I think should’ve been included in the paid tier at least, since then there would be a user tracked in Spokeo’s own systems asking for the information, just like how credit history is tracked. The same standard as credit history should apply to personal information, even phone numbers in my opinion since most phones are cellphones that have a GPS module within them. Locating a cell phone is much easier since each one has a unique id that maps to it.

Opting out of the information was easy enough and Spokeo actually did it automatically. After I submitted the form, it must’ve taken the post off the website. However, I doubt the data is deleted. In fact, a couple weeks ago I was testing a site I developed that had a Facebook plugin (any website with a facebook plugin will automatically track anyone visiting the site). I received a reactivation email of my account from Facebook from 7 years ago. Information is definitely kept after you request it to be taken down unless otherwise.

Protection in the United States and Networked Information

At least in the United States, there are only a few laws in place that loosely relate to the type of information kept by websites like Spokeo or organizations like Facebook. A collection of them can be found in the Wikipedia article, but legal protections on anything cyber-related in the US is never prioritized. Sometimes it is for good reason to allow for freedoms such as allowing a site such as Google or Facebook to operate, but other times they hurt the individual more than whatever the organization gets out of it.

In Georgia, public information is disclosed at http://open.georgia.gov/index.html. Since my parents are both GSU faculty and even I am part of GSU as a student assistant, information about our salaries, travel expenses, full names are all available to the public. This isn’t bad since the public needs to know about these facts. It’d be weird if I, a student assistant, was being paid like 60,000 dollars for travel expenses. Someone would notice that and contact those in charge (see: Tom Price Travel Expenses Scandal). However, any organization outside of government that uses this information in ill ways, such as mapping it to an account to track personal expenditures and locations, should be reprimanded rights to use the information and a substantial fine. However, from the laws protecting personal information, it’d be a hail Mary trying to argue a case in court.

Conclusion

Overall, the state of personal information disclosure on the internet is dictated and authorized by non-government organizations for their own uses. Finding my own personal information was easy, and for Spokeo removing it was easy too. But I expect more protections for my personal information, since it is too easy for someone to use it for their own benefit.

Unit 4 : Assignment 2

Tool Inventory

Big Data for Big Problems

Hello all! This week, of courses, is none other than the crunchtime before finals week, so I have been really busy finishing some school projects and studying for finals since Thanksgiving. Which by the way, I encourage you to leave a comment about how your Thanksgiving went! Yours truly roasted a bird this year and it didn’t turn out that bad!

Quite good, I somewhat hit the nail on the head this year.

But turkeys aside, this week has been a wild start. Between school projects and work, I never had to deal with database systems ever before. The DALN does work on a server backend which you can view here on Github, but the frontend gets the information through a REST API, which you can see how I used in my last blog post.

This week I went to a big data workshop named Introduction to Data Intensive Computing Environment (DICE). Somewhat a scary name, not going to lie, but when I got to the workshop I was pleasantly surprised about the way they taught these complex tools. And by they I mean Semir Sarajlic, a Research Computing Specialist from Research Solutions and Suranga Edirisinghe, the High Performance Computing Facilitator here at GSU. They made it easy to understand what exactly it meant on the technical side of big data, a term that has been in the background in mine and many other beginning Computer Science majors for years.

So DICE essentially is a collection of several different technologies that all students can use to work on big data projects on their own time, some of which you can see in the graphic below.

Creative Commons. From: ” Reference Architecture and Classification of Technologies, Products and Services for Big Data ” by Pekka Pääkkönen and Daniel Pakkala

We’re talking huge computing factor, the specs on the individual data node computers are top of the line. You can view how it works on the user interface side as well as any specs here. At the workshop, we browsed through a couple of tutorials dealing with Twitter and Eventbrite data, which were kind of like the simple REST API calls I made with the DALN. However, we were shown how in like a tool Apache Hadoop would be able to divide multiple data sources like REST APIs to other data nodes and do separate processes on them. For example, lets say you wanted to get Twitter Data from two different geolocations and map them in relation to each other. With Hadoop, you can easily manage this workload across data nodes and applications (like ESRI ArcGIS) in parallel. Hadoop has a lot of technologies, but the one we used was HDFS (Hadoop Distributed File System) where you can run commands to put data somewhere and do functions on it as well as even executing OS-level commands such as moving directories or giving permissions for other DICE users to use.

So I might’ve gone a little bit over in explaining how DICE works, but you don’t need to learn from me. The DICE is already filled with tutorials on how to use the computing cluster for projects and even teachings about big data projects from the beginner programmers’ perspective in series of Python workbooks that look like this:

notebook

See each one of the blocks? All you have to do is press Shift-Enter to run the commands in real-time. Very easy to learn with once you get the hang of it. And there are dozens of them, so if you want to do something else, there are plenty of tutorials to learn from.

I hope I brought some worthwhile information. DICE is a huge thing and paid by the Student Tech Fee, so I hope that all students can learn from this free resource. Stay tuned for updates on the DICE, as I’m hoping to use it this weekend at HoloHack as well as other projects. If you have any questions, feel free to post a comment below or

Join the DICE slack: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftinyurl.com%2Fj734wce&data=01%7C01%7Cwmomen1%40student.gsu.edu%7C0f9c0f433f834470721e08d4186eb2b3%7C704d822c358a47849a1649e20b75f941%7C0&sdata=Hk1ngJtNWM4%2Fn%2F%2F2kTjoP7R30rah2N6S%2BbWBy%2BJjoB0%3D&reserved=0

Request a DICE account: https://researchsolutions.atlassian.net/wiki/display/PD/Requesting+Access+to+DICE

Loving JSON

For the last few months, I’ve been working on the DALN Project, a collection of literacy narratives by just regular people like you and me. Actually, I’m on there. Points to someone who can find my embarrassing freshman self on there.

Anyways, coding for the DALN has been an ongoing project even before I was assigned. One of our other SIFs, Shakib Ahmed, was in the process of coding the backend since last summer and he’s done a pretty good job by creating an API (Application Program Interface) that I can use in my own frontend side. Basically every time I need something from the API, I just browse to it like any other URL and it returns something like this:

[{"description":"Awesome DALN narrative","identifierUri":"http://hdl.handle.net/2374.DALN/123","dateAccessioned":"2009-05-28T15:44:25Z","dateAvailable":"2009-05-28T15:44:25Z","dateCreated":"2009-05-03","dateIssued":"2009-05-28T15:44:25Z","rightsConsent":"adult","rightsRelease":"adult","contributorAuthor":["Dude, Guy"],"creatorGender":["Female"],"creatorYearOfBirth":["1976"],"coveragePeriod":["2000-2009"],"coverageNationality":["U.S. citizen"],"subject":["dude","that's","just","like","your","opinion","man"],"assetList":[{"Asset Type":"Audio/Video","Asset ID":"1212312-12321-31-3-2-21-3","Asset Description":"Literacy Narrative","Asset Location":"https://mwharker.vids.io/videos/duuude","Asset Name":"dude.mov"}],"postId":"223423423423","title":"DUUUUDEEE"}]

Which is a JSON callback. Basically, it’s a mix of JSON objects which name-value pairs that you can use in coding. For example, storing the above code in a variable like var data allows for us to access each name pair and get the data from it. so console.log(data.description) would return a String Awesome DALN narrative, at least in Javascript. It is a pretty cool representation of data even though it just looks like a text file with some commas. And one of the great things about it is that JSON can be what’s called language agnostic , meaning that you can use Javascript, Scala, Groovy, Perl, PHP, or whatever language you want. This makes it easy to make additional applications with the same data. Very useful in today’s multiplatform world.

JSON is so flexible that you can even use it in game design. Whenever a game saves, you want to save a lot of data and compress it so that the next time the game loads up, you will have all the things in the right places or at least the last checkpoint of wherever the player was. For our name and value pairs we could put things like player_location, enemy_locations, power_up_enabled into a JSON object to store for later and use a library like fullserializer to read it back into Unity.

I hope that we will be able to use JSON in the upcoming 3D Atlanta project. The dream is that a person would be able to go to a web site, pick their historic area of interest and be able to configure their own game environment using options they selected, so JSON will work with that as holding the values for Unity to build an environment on. Thanks to DALN, I know a ton more about it and hope to utilize the full extent of JSON in the future for other projects I create.

It all comes back around…

Hello all! It’s been a long time since my last blog post, but that’s because I’ve been writing documentation instead. Since then, SIF has moved to CETL, the Center for Excellence in Teaching and Learning, so you should check us out sometime! We’re there Wednesdays and Fridays from 2pm to 5pm.

CETL is pretty awesome too. We have a couple of cool tech to play around with including VR and AR headsets like the Oculus Rift and Microsoft Hololens. This past weekend at hackGT I got to code for a hololens application to get information from devices via a web app, but I hope to continue working on it so that it could be more useful. My team and I haven’t built anything Unity-side (Hololens uses Unity as a way of managing its content, or you can build from the ground up as a UWP app), but stay tuned for more on that. In case you’re wondering what the Hololens is, check it out here .

However, today I’d like to talk about a very cool interaction that’s happened to me this year at SIF. At my first year at GSU, I took ENG 1103, which is a Critical Writing course here at GSU. I had credits for ENG 1101 and 1102 when I came here. So ENG 1103 focuses on a thing called Digital Literacy, a movement to help define some forms the term literacy takes on as well as the tracking the active social process of literacy in different cultures, countries, or even within an individual. It sounds like a broad topic, but in actuality it’s kind of the “man behind the curtain” of all the reasons we have classes about literacy no matter what discipline you’re in. There is always something you have to do to communicate with others, and that translates into specific practices repeated over and over again.

One of the biggest projects dedicated to the study of Digital Literacy is the DALN, or the Digital Archive of Literacy Narratives. It’s a project created at Ohio State University, and they have several people outside that manage the project, one of them being my former professor, Dr.Michael Harker here at GSU. Very cool dude (not guy, he’s earned the title of dude). And now, SIF is taken the chance to build a new website for the DALN. We want to make it as socially shareable as possible, as easy as someone to just click a button to share it with, since the stories in the DALN are the stories that reflect our lives. Understanding someone’s literacy narrative is understanding the way they think about certain issues, the way they hold social relationships with the people around them, and the ultimate idea of who they are.

You should check out the DALN, it’s awesome. Simply search a term you are interested in, like video games, and pull up a huge amount of stories from regular people.

Projects, Projects, Projects

Many projects are going on in SIF right now, and we have quite a good lot of “instructor-client” projects where teachers give us ideas for projects and help out with the development process.

Some examples of these projects are the DALN-app which aims to make an app for the DALN and the VRProject that is researching on how to make virtual worlds accessible to the visually impaired. You can find some of these projects on the SIF Github Page.

We’re really trying to finish some of the larger projects that can be done with the amount of Computer Science majors and minors we have now. However, remember that most of us have experience only related by interests, projects, or prior work, so most of us are only good at one specific task.

It’s one of the reasons SIF is so great, since we get to all come together in meetings and help each others’ weaknesses and bolster our strengths. Some of the projects like 3DAtlanta, Hoccleve Archive, and DALN-app can only be done when we have a lot of people with specific skillsets like web development or design, all in one place having a conversation.

And now we wish to bring the conversation of all of our projects to you. We’re beginning the process of documenting all of our work so that others can reach the same results with every project. We’ve already done some preliminary posts on our Github Wikis to try and establish what kind of technologies we are using in our projects and giving a base for others to jump off of to either replicate our projects or even go further.

Next time, I’ll talk about some of the resources I’ve found while working at SIF, and how you can get a whole bunch of free stuff as a student/faculty/staff member!…And I know we all like free things right?

Teaching Others whilst Learning Yourself

Hello everyone! Welcome to a new semester, a new year, and new projects for all of us! These past weeks since school started have been incredibly busy since we’ve hit the ground running.

One of the major updates to SIF is we now have a Github organization! You can view our projects being made in real time. Hopefully, it’ll be here to stay since this is new ground for open sourcing projects, even if those projects are for educational use. I’ll be writing more relevant details and sending out an email to all the SIFs on how they can use Github for their projects soon.

Also, this week I went to the Open Conference on Teaching and Technology here at GSU where I learned about an exciting new technology called wrote a post in the wiki on what shapefiles are. While I focused on the very basics, I hoped to let people looking at the project to understand how we proceeded each step of development and research to both my teammates and to anyone interested in the GIS. I added some links that would appeal to different learners.

For example, I linked to a post from Doug Mccune about 3D renderings of topojson files (which is a sort of extension of shapefile data). And I also referenced a great Penn State course on the fundamentals GIS. It was a great resource when looking up stuff for 3D Atlanta.

Overall, it is a great experience to document research findings and general knowledge about certain topics as we go along with our projects. Not only will be beneficial for us as a team, but also for any future learners. In the future, this idea will be solidified with the coming of new technologies and the beginning of a new age of learning.

I look forward to any feedback you guys have to share on my post. Until next time.

Seeing is Believing

So for a while now, Virtual Reality (VR) has been a thing. It’s been slowly gathering steam for the last 5 years or so. And now it’s coming to GSU! 😀

Currently, there are two Oculus Rifts (Dev Kit 2) at the CII for SIFs-only use, meaning just regular students won’t be able to reserve them, but faculty will probably. (To see what other equipment you can check out as student/faculty visit the CII website ).

However, I believe that at least one person in the CII is looking at the potential use of 3D technologies in general, especially related to VR. So there will be a good chance both students and faculty will get a good chance to mess around with VR.

If you are a faculty member that wants VR or any other technology in your department, go to the Student Tech Fee website to learn more. Students can take the initiative and share the information of the Tech Fee to their teachers. Help make your classroom the classroom of tomorrow™ !

Anyways, SIF has been busy with VR and it’s a major focus next semester. Looking at the projects page, you can see a lot of material we’ve been gathering ever since SIF got started. We’ve been using the Digital Collections section of the library a lot. Unpacking Manuel’s, 3D Atlanta Project, and an upcoming VR for the Blind project are all using 3D space whether it’s in technologies like Unity, Memento, or Agisoft.

So how can VR benefit the classroom? What are the hard facts when it comes to VR? These are questions we must answer as we continue to build on these projects.

Hopefully, I’ll offer my facts without bias in a later blog post (as well as explaining what my bias really is for VR) at a later blog post. But until then…