First of all Happy New Year! I hope you'll all have a 2011 full of inspiration and interesting work on the web. A quick post to kick off the New Year. Why are Clean URLs important? And how to set them up painlessly thanks to Apache's mod_rewrite module.

There are many reasons to use clean URLs. First of all the cleanness. A long query string seems garbled and you certainly won't remember it when somebody asks for it. Facebook introduced it in 2009 (quite late as this is out there much longer). Now compare http://www.facebook.com/bbelderbos with http://www.facebook.com/profile.php?id=628517118 .. the latter is hard to remember and .php and id are not relevant for the average user, right?

Second SEO. Google's Webmaster guidelines outline: "If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few." So it is best to be short but informative in your links.

Moreover, it's the standard. About any CMS has this feature integrated. Take Wordpress: check the URL of this post you see domain -> year -> month -> slug of the post. It is informative, it tells the title and year/month of the post. Moreover, instead of page.php?id=1234&foo=5678, it contains keywords which are of relatively high value for search engine indexing. So clean URLs make for a better search engine ranking!

So now to the techie part, how can we make them? It is rather easy! Take for example sharemovi.es, I have an index page that gets a GET variable called "t" with the value of the user, here "bbelderbos". This means it will get data from the database according the user it gets inputted.

Now I want to have this translated to :

Enter Apache's mod_rewrite. You'll need to make/edit a file called .htaccess at the same level where the script (index.php) file is located. All it takes are the following two lines:

RewriteEngine On
RewriteRule ^(\w+)$ index.php?t=$1

- The first line lets Apache (the webserver) know that you want to rewrite URLs

- The second line actually defines the rewrite action. It starts with a regular expression: '^' to start the string, '(\w)' to match a word, then '$' to mark the end of the string. This will match a string that is stored in variable '$1'.
Then it defines the rewrite pattern: index.php?t=$1, so filling it with what it retained in $1 it will translate 'bbelderbos' to 'index.php?t=bbelderbos'. The beauty of this it's done under the hood: while the user is presented with the tidy URL, the server receives the full info it will need to generate the page: index.php?t=bbelderbos

See more examples of rewrite rules in this article or simply google 'mod_rewrite'. To write (complicated) rewrite rules, requires some knowledge of Regular Expressions. Oreilly has some interesting titles on the subject, but you can easily get along checking out almost any PHP or programming tutorial. I like the introduction on regex given in this PHP book.


Bob Belderbos

Software Developer, Pythonista, Data Geek, Student of Life. About me