URL redirection in Archival Projects

A lot of the websites that we work with here at SIF are updates for archival websites that need new features that couldn’t be done with the original technology. While most of the features can be rehashed into any framework like Omeka, WordPress, or Angular, there are a couple of items that a website will use that aren’t exactly backwards compatible very easily. One of these items is the url a website.

Today, the most emphasis on urls are SEO capability–basically how bots rank a website and give it higher priority in search results. Having a nicely formatted, human readable URL is more important than any fancy text styling on a webpage, since the user will be directly manipulating interfaces like buttons to get to different places in the website changing the URL as they go. Therefore, redirecting URLs should be done in a process to both safely transfer a person to the correct webpage they want to look at and to do so in a way approvable by a bot looking at the transition.

The examples I want to look at will be hopefully less technical and more about the high-level process so anyone can understand the gist of the process. If you want a more technical view of the process, here’s a Mozilla post detailing the process: https://moz.com/learn/seo/redirection

So let’s take a look at an old URL that used to be a part of daln.osu.edu.

daln.osu.edu/handle/2374.DALN/7099

Seems alright for a url. Doesn’t have any weird symbols like percentages or hashtags. Not even that long of a url, which is very good. Can a human interpret it? Probably not since the words of handle and 2374.DALN don’t translate to anything. 7099 could be a couple things like the 7099th post to the DALN or maybe it’s just an arbitrary id. A bot would be harsh on this URL.

The redirection process can be done in several ways, but any redirection will need a server with instructions to redirect any link from the old site to the new site using the old site’s website domain (aka www.daln.osu.edu).

The way the DALN does this is any link to daln.osu.edu has a simple webpage saying the website has been redirected and if the person clicked on a url like the one above, we will look for post 7099’s new post id and then send them onto the relevant detail page. That way the url gets transformed like so,

daln.osu.edu/handle/2374.DALN/7099 -> thedaln.org/#/detail/0e231f40-6149-4af1-b922-16c8818e25cc

And now the the user is on the new webpage! Nice. However, thedaln.org/#/detail/0e231f40-6149-4af1-b922-16c8818e25cc isn’t exactly human readable either. In fact, it might be even worse than the original url to a bot. In the near future, I’d like to change the url into a more readable format with the title instead like:

thedaln.org/detail/my-word

Waaaay better–for bots and humans. All posts on the DALN need a title and an id, so putting the title would be the best idea. However, post ids on the DALN are unique, but titles could be the same. Two posts having the same title means the above url might link to two posts, which is not good. A github issue on the DALN (https://github.com/gastate/dalnfrontend/issues/9) github depicts SEO tasks, and I’ve made the topic of this blog post into one.

SEO, or search engine optimization, is a very important concept to think about for anyone making even the most simplest website. It will define the amount of traffic to a website, how search engines will rank a website in search results order, and a lot more effects. There are a lot of factors of SEO, but that’ll be a blog post for another day.

For archival sites, it is also important to retain reliability across records so that even if the records themselves are deleted (maybe the website is deleted) then there is a url that said the record existed.

The DALN manages this with a lot of publications using the handle.net (http://handle.net/) services. Typically the links look exactly like the original url.

hdl.handle.net/2374.DALN/7099

These are also redirected in the same manner. How handle.net works is by hooking up a proxy server to the original url. All this server does is maintain the relationship of the handle urls to the original urls via a database. It’s the same as navigating from the original url.

You might’ve heard about url shortening as well, maybe even used services like bit.ly or goo.gl. They work pretty much in the same way. However, archival websites favor reliability instead of usability, so the handle.net service is more reliable since a dedicated organization for archival research.

I hope you’ve learned a little bit more of how this tangle of servers we call the Internet works with urls. They are the roadways in which we travel and making sure to get to the destination safely should be a primary focus of every person making a website. I hope to go more into web development technologies at a non-technical standpoint, so be sure to stick around for my next blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *