Creating readable URLs with mod rewrite
Ugly URLs…
Querystring style URLs like below are synonymous with dynamic systems
http://www.mysite.com.au/catalog.php?category=cartridges&prod_id=33
While URLs like this are completely functional, there are a few issues with URLs like this you should consider before going live:
- The underlying technology is exposed such that script kiddies can inject their own data - such as an sql injection.
- The URL contains ampersands which, when unescaped will affect the XHTML compliance of linking websites
- Search engines will avoid indexing dynamic pages like this
So without refactoring the application how can you make more freindly URLs? A simple solution for Apache based users is to utilize the mod rewrite module.
mod rewrite to the rescue!
Note: mod_rewrite is not loaded into Apache by default, to do so open your httpd.conf file and uncomment the module load code
Apache’s mod rewrite module can rewrite requested URLs on the fly, meaning you can substitute ugly querystrings with meaningful URLs, which address security issues, compliance and SEO. So how does mod rewrite work?
Basic rewriting
Typically mod rewrite directives are added to a htaccess file in your web root. Here’s a simple example to start with:
RewriteEngine on RewriteRule ^old_page.html$ new_page.html
This rewrite transparently redirects a request for old_page.html to new_page.html. The first line enables the engine (only required once per htaccess file):
RewriteEngine on
This next line basically tells the server if there is a request matching oldpage.html, then substitute it with new_page.html. The caret ^ and dollar $ sign signify the start and end of the string used for the match.
RewriteRule ^old_page.html$ new_page.html
If you want to actually do a more traditional redirect and show the location of the page in the status bar just add [R] to the end of the RewriteRule.
Easy enough? That was a pretty simple rewrite. The real flexibility of mod rewrite requires some knowledge of Regular Expressions, which can get quite complex. However, the functionality and flexibility Regular Expressions offer in PHP make them well worth learning. So lets take a look at some common rewrites and the expressions used to make them happen.
Unleashing mod rewrite with Regular Expressions
With the help of Regular Expressions we can create RewriteRules which match a set of URLs and redirect them to their actual pages. Consider the products pages in our fictional Shopping Cart app which only vary in category name and product id. We can identify requests for products page by specifically matching the PHP filename, something representing a category name, forward slash, then something representing a product id. And here’s how our rule looks:
RewriteRule ^catalog/([a-zA-Z0-9-])/([0-9]+)/$ /catalog.php?category=$1&prod_id=$2
The parts in square brackets are known as ‘ranges’. In this case where allowing anything in an alphanumeric range (case insensitive) for our category name, then anything in the numeric range for our product id. We’ve also encased these regular expressions in parenthesis so we can ‘back reference’ our matches to pass the values onto our PHP page. Back referencing is done via indexing each set of parenthesises, $1 for our first parenthesises, being our category name, and $2 for our second parenthesises, being our product id.
Well this is the end of my tutorial on how to use mod rewrite to create freindly URLs. If you have any questions please feel free to post a comment. Cheers!






Hi,
I wonder if you have a solution for my problem.
I have the following setup:
NameVirtualHost 192.168.11.246
ServerName www.mysite.com
ServerAlias mysite.com
DocumentRoot /www/mysite/htdocs
Options +FollowSymLinks
User apache
RewriteEngine on
RewriteLog “/www/mysite/logs/rewrite_log”
RewriteLogLevel 9
RewriteRule ^/article-([0-9]*)-([A-Z|a-z|0-9|-|_|/s]*)\.dx$ /display.do?title=$2&pg=$1 [L]
RewriteRule ^/(.*\.do);jsessionid* /$1 [L]
RewriteRule ^/(.*[\.html|\.jpg|\.JPG|\.gif|\.GIF|\.css]);jsessionid* /$1 [L]
RewriteRule ^(.*)/$ http://www.mysite.com/open.do [R]
(note I’m filtering out the “jsessionid”s to make search engine friendly)
(Using [PT] at end of rule)
192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) init rewrite engine with requested uri /article-9-Hes_My_Hero_Hero.dx
192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) applying pattern ‘^(.*)/article-([0-9]*)-([A-Z|a-z|0-9|-|_|/s]*)\.dx$’ to uri ‘/article-9-Hes_My_Hero_Hero.dx’
192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) rewrite /article-9-Hes_My_Hero.dx -> /display.do?title=Hes_My_Hero&pg=9
192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) split uri=/show.do?title=Hes_My_Hero&pg=9 -> uri=/display.do, args=title=Hes_My_Hero&pg=9
192.168.1.26 - - [29/Apr/2007:20:53:30 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) forcing ‘/display.do’ to get passed through to next API URI-to-filename handler
And we get a 404 as the document does not exist at
/www/mysite/htdocs/display.do
(Using [L] at end of rule)
192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) init rewrite engine with requested uri /article-9-Hes_My_Hero.dx
192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) applying pattern ‘^(.*)/article-([0-9]*)-([A-Z|a-z|0-9|-|_|/s]*)\.dx$’ to uri ‘/article-9-Hes_My_Hero.dx’
192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) rewrite /article-9-Hes_My_Hero.dx -> /display.do?title=Hes_My_Hero&pg=9
192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (3) split uri=/show.do?title=Hes_My_Hero&pg=9 -> uri=/display.do, args=title=Hes_My_Hero&pg=9
192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) local path result: /display.do
192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (2) prefixed with document_root to /www/mysite/htdocs/display.do
192.168.1.26 - - [29/Apr/2007:20:56:40 +1000] [www.mysite.com/sid#80c9038][rid#80d1ea8/initial] (1) go-ahead with /www/mysite/htdocs/display.do [OK]
(of course /www/mysite/htdocs/display.do does not exist, and I get a 404)
Why do I get those last 4 rewrites happening and how do I stop them? - that seems to be the best “solution”.
(Using [R] at the end of the rule, gets a clear redirect to the original url (display.do?page=9&title=Hes_My_Hero, which is what I’m trying to hide from the search engines).
Any ideas? I’m at my wits end about this.
Thanks a lot.
Mel.
Comment by Mel Drego — April 29, 2007 @ 11:54 pm