Monthly Archives: October 2017

URL redirection in Archival Projects

A lot of the websites that we work with here at SIF are updates for archival websites that need new features that couldn’t be done with the original technology. While most of the features can be rehashed into any framework like Omeka, WordPress, or Angular, there are a couple of items that a website will use that aren’t exactly backwards compatible very easily. One of these items is the url a website.

Today, the most emphasis on urls are SEO capability–basically how bots rank a website and give it higher priority in search results. Having a nicely formatted, human readable URL is more important than any fancy text styling on a webpage, since the user will be directly manipulating interfaces like buttons to get to different places in the website changing the URL as they go. Therefore, redirecting URLs should be done in a process to both safely transfer a person to the correct webpage they want to look at and to do so in a way approvable by a bot looking at the transition.

The examples I want to look at will be hopefully less technical and more about the high-level process so anyone can understand the gist of the process. If you want a more technical view of the process, here’s a Mozilla post detailing the process: https://moz.com/learn/seo/redirection

So let’s take a look at an old URL that used to be a part of daln.osu.edu.

daln.osu.edu/handle/2374.DALN/7099

Seems alright for a url. Doesn’t have any weird symbols like percentages or hashtags. Not even that long of a url, which is very good. Can a human interpret it? Probably not since the words of handle and 2374.DALN don’t translate to anything. 7099 could be a couple things like the 7099th post to the DALN or maybe it’s just an arbitrary id. A bot would be harsh on this URL.

The redirection process can be done in several ways, but any redirection will need a server with instructions to redirect any link from the old site to the new site using the old site’s website domain (aka www.daln.osu.edu).

The way the DALN does this is any link to daln.osu.edu has a simple webpage saying the website has been redirected and if the person clicked on a url like the one above, we will look for post 7099’s new post id and then send them onto the relevant detail page. That way the url gets transformed like so,

daln.osu.edu/handle/2374.DALN/7099 -> thedaln.org/#/detail/0e231f40-6149-4af1-b922-16c8818e25cc

And now the the user is on the new webpage! Nice. However, thedaln.org/#/detail/0e231f40-6149-4af1-b922-16c8818e25cc isn’t exactly human readable either. In fact, it might be even worse than the original url to a bot. In the near future, I’d like to change the url into a more readable format with the title instead like:

thedaln.org/detail/my-word

Waaaay better–for bots and humans. All posts on the DALN need a title and an id, so putting the title would be the best idea. However, post ids on the DALN are unique, but titles could be the same. Two posts having the same title means the above url might link to two posts, which is not good. A github issue on the DALN (https://github.com/gastate/dalnfrontend/issues/9) github depicts SEO tasks, and I’ve made the topic of this blog post into one.

SEO, or search engine optimization, is a very important concept to think about for anyone making even the most simplest website. It will define the amount of traffic to a website, how search engines will rank a website in search results order, and a lot more effects. There are a lot of factors of SEO, but that’ll be a blog post for another day.

For archival sites, it is also important to retain reliability across records so that even if the records themselves are deleted (maybe the website is deleted) then there is a url that said the record existed.

The DALN manages this with a lot of publications using the handle.net (http://handle.net/) services. Typically the links look exactly like the original url.

hdl.handle.net/2374.DALN/7099

These are also redirected in the same manner. How handle.net works is by hooking up a proxy server to the original url. All this server does is maintain the relationship of the handle urls to the original urls via a database. It’s the same as navigating from the original url.

You might’ve heard about url shortening as well, maybe even used services like bit.ly or goo.gl. They work pretty much in the same way. However, archival websites favor reliability instead of usability, so the handle.net service is more reliable since a dedicated organization for archival research.

I hope you’ve learned a little bit more of how this tangle of servers we call the Internet works with urls. They are the roadways in which we travel and making sure to get to the destination safely should be a primary focus of every person making a website. I hope to go more into web development technologies at a non-technical standpoint, so be sure to stick around for my next blog post.

Privacy and Information with CyberSpace

TL;DR

  • Always look up information about yourself on the Internet. Do so frequently.
  • Secure your information by mitigating information present on the web, a good start is this link
  • Public release of personal information is sometimes required, but it should protect the people first.
  • Current policies to protect individuals are too lax and are protecting organizations more than individuals.

Background Info and Spokeo Research

Being a computer science student interested in social engineering, infosec, phreaking, and other such fields, this assignment was just like a regular security check to “hack myself” I suppose. People should check their footprint all the time on the internet for additional information since it can affect their lives, jobs, and their loved ones. Events such as the Equifax hack has made it apparent authorities even at the bureaucratic level cannot keep your information safe (and it wasn’t even a complex hack at all, it was a username:admin password:password situation).

I conducted a search on Spokeo and found myself on the site as well as the rest of my family in the same household. Some of the information there I think should’ve been included in the paid tier at least, since then there would be a user tracked in Spokeo’s own systems asking for the information, just like how credit history is tracked. The same standard as credit history should apply to personal information, even phone numbers in my opinion since most phones are cellphones that have a GPS module within them. Locating a cell phone is much easier since each one has a unique id that maps to it.

Opting out of the information was easy enough and Spokeo actually did it automatically. After I submitted the form, it must’ve taken the post off the website. However, I doubt the data is deleted. In fact, a couple weeks ago I was testing a site I developed that had a Facebook plugin (any website with a facebook plugin will automatically track anyone visiting the site). I received a reactivation email of my account from Facebook from 7 years ago. Information is definitely kept after you request it to be taken down unless otherwise.

Protection in the United States and Networked Information

At least in the United States, there are only a few laws in place that loosely relate to the type of information kept by websites like Spokeo or organizations like Facebook. A collection of them can be found in the Wikipedia article, but legal protections on anything cyber-related in the US is never prioritized. Sometimes it is for good reason to allow for freedoms such as allowing a site such as Google or Facebook to operate, but other times they hurt the individual more than whatever the organization gets out of it.

In Georgia, public information is disclosed at http://open.georgia.gov/index.html. Since my parents are both GSU faculty and even I am part of GSU as a student assistant, information about our salaries, travel expenses, full names are all available to the public. This isn’t bad since the public needs to know about these facts. It’d be weird if I, a student assistant, was being paid like 60,000 dollars for travel expenses. Someone would notice that and contact those in charge (see: Tom Price Travel Expenses Scandal). However, any organization outside of government that uses this information in ill ways, such as mapping it to an account to track personal expenditures and locations, should be reprimanded rights to use the information and a substantial fine. However, from the laws protecting personal information, it’d be a hail Mary trying to argue a case in court.

Conclusion

Overall, the state of personal information disclosure on the internet is dictated and authorized by non-government organizations for their own uses. Finding my own personal information was easy, and for Spokeo removing it was easy too. But I expect more protections for my personal information, since it is too easy for someone to use it for their own benefit.