October 25, 2006

Creating readable URLs with mod rewrite

You will find other articles relevant to this document in these sections:
Richard Lee @ 12:56 am

Ugly URLs…
Querystring style URLs like below are synonymous with dynamic systems

http://www.mysite.com.au/catalog.php?category=cartridges&prod_id=33

While URLs like this are completely functional, there are a few issues with URLs like this you should consider before going live:

  1. The underlying technology is exposed such that script kiddies can inject their own data - such as an sql injection.
  2. The URL contains ampersands which, when unescaped will affect the XHTML compliance of linking websites
  3. Search engines will avoid indexing dynamic pages like this

So without refactoring the application how can you make more freindly URLs? A simple solution for Apache based users is to utilize the mod rewrite module.

mod rewrite to the rescue!

Note: mod_rewrite is not loaded into Apache by default, to do so open your httpd.conf file and uncomment the module load code

Apache’s mod rewrite module can rewrite requested URLs on the fly, meaning you can substitute ugly querystrings with meaningful URLs, which address security issues, compliance and SEO. So how does mod rewrite work?

Basic rewriting

Typically mod rewrite directives are added to a htaccess file in your web root. Here’s a simple example to start with:

RewriteEngine on
RewriteRule ^old_page.html$ new_page.html

This rewrite transparently redirects a request for old_page.html to new_page.html. The first line enables the engine (only required once per htaccess file):

RewriteEngine on

This next line basically tells the server if there is a request matching oldpage.html, then substitute it with new_page.html. The caret ^ and dollar $ sign signify the start and end of the string used for the match.

RewriteRule ^old_page.html$ new_page.html

If you want to actually do a more traditional redirect and show the location of the page in the status bar just add [R] to the end of the RewriteRule.

Easy enough? That was a pretty simple rewrite. The real flexibility of mod rewrite requires some knowledge of Regular Expressions, which can get quite complex. However, the functionality and flexibility Regular Expressions offer in PHP make them well worth learning. So lets take a look at some common rewrites and the expressions used to make them happen.

Unleashing mod rewrite with Regular Expressions

With the help of Regular Expressions we can create RewriteRules which match a set of URLs and redirect them to their actual pages. Consider the products pages in our fictional Shopping Cart app which only vary in category name and product id. We can identify requests for products page by specifically matching the PHP filename, something representing a category name, forward slash, then something representing a product id. And here’s how our rule looks:

RewriteRule ^catalog/([a-zA-Z0-9-])/([0-9]+)/$ /catalog.php?category=$1&prod_id=$2

The parts in square brackets are known as ‘ranges’. In this case where allowing anything in an alphanumeric range (case insensitive) for our category name, then anything in the numeric range for our product id. We’ve also encased these regular expressions in parenthesis so we can ‘back reference’ our matches to pass the values onto our PHP page. Back referencing is done via indexing each set of parenthesises, $1 for our first parenthesises, being our category name, and $2 for our second parenthesises, being our product id.

Well this is the end of my tutorial on how to use mod rewrite to create freindly URLs. If you have any questions please feel free to post a comment. Cheers!

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  • YahooMyWeb

1 Comment »

  1. Hi,
    I wonder if you have a solution for my problem.
    I have the following setup:
    NameVirtualHost 192.168.11.246

    ServerName www.mysite.com
    ServerAlias mysite.com
    DocumentRoot /www/mysite/htdocs
    Options +FollowSymLinks
    User apache
    RewriteEngine on
    RewriteLog “/www/mysite/logs/rewrite_log”
    RewriteLogLevel 9
    RewriteRule ^/article-([0-9]*)-([A-Z|a-z|0-9|-|_|/s]*)\.dx$ /display.do?title=$2&pg=$1 [L]
    RewriteRule ^/(.*\.do);jsessionid* /$1 [L]
    RewriteRule ^/(.*[\.html|\.jpg|\.JPG|\.gif|\.GIF|\.css]);jsessionid* /$1 [L]
    RewriteRule ^(.*)/$ http://www.mysite.com/open.do [R]

    (note I’m filtering out the “jsessionid”s to make search engine friendly)
    (Using [PT] at end of rule)
    192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) init rewrite engine with requested uri /article-9-Hes_My_Hero_Hero.dx
    192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) applying pattern ‘^(.*)/article-([0-9]*)-([A-Z|a-z|0-9|-|_|/s]*)\.dx$’ to uri ‘/article-9-Hes_My_Hero_Hero.dx’
    192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) rewrite /article-9-Hes_My_Hero.dx -> /display.do?title=Hes_My_Hero&pg=9
    192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) split uri=/show.do?title=Hes_My_Hero&pg=9 -> uri=/display.do, args=title=Hes_My_Hero&pg=9
    192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) forcing ‘/display.do’ to get passed through to next API URI-to-filename handler
    And we get a 404 as the document does not exist at
    /www/mysite/htdocs/display.do

    (Using [L] at end of rule)
    192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) init rewrite engine with requested uri /article-9-Hes_My_Hero.dx
    192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) applying pattern ‘^(.*)/article-([0-9]*)-([A-Z|a-z|0-9|-|_|/s]*)\.dx$’ to uri ‘/article-9-Hes_My_Hero.dx’
    192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) rewrite /article-9-Hes_My_Hero.dx -> /display.do?title=Hes_My_Hero&pg=9
    192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) split uri=/show.do?title=Hes_My_Hero&pg=9 -> uri=/display.do, args=title=Hes_My_Hero&pg=9
    192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) local path result: /display.do
    192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) prefixed with document_root to /www/mysite/htdocs/display.do
    192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (1) go-ahead with /www/mysite/htdocs/display.do [OK]
    (of course /www/mysite/htdocs/display.do does not exist, and I get a 404)
    Why do I get those last 4 rewrites happening and how do I stop them? - that seems to be the best “solution”.

    (Using [R] at the end of the rule, gets a clear redirect to the original url (display.do?page=9&title=Hes_My_Hero, which is what I’m trying to hide from the search engines).
    Any ideas? I’m at my wits end about this.

    Thanks a lot.
    Mel.

    Comment by Mel Drego — April 29, 2007 @ 11:54 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment