January 19, 2007

Intro to Mod Rewrite for SEO URLs

You will find other articles relevant to this document in these sections:
Richard Lee @ 10:37 am

Mod Rewrite is an Apache module commonly used with PHP to create Search Engine friendly URLs. Essentially this module lets us mask ugly querystrings with much more meaningful URLs. For example take a products page in a database driven catalog:

http://www.mysite.com.au/catalog/product.php?pid=123

At the moment this URL tells us nothing about the destination page and searchbots lare deterred from indexing such URLs - Wouldn’t it be better if we included the product name?

http://www.mysite.com.au/catalog/products/t-shirt/123

Much better :) . Using the power of regular expressions we can extract the information from our new URL and pass this information onto the correct PHP page behind the scenes.

So how do we do it? In the case of our new URL all we need to do is extract the last part which carries the product id and pass this onto the product.php page. We can do this easily in a .htaccess file:

First we enable the Mod Rewrite engine using RewriteEngine on:

# HTACCESS
 
RewriteEngine On # enable the module

Then we set the base url we will be writing from using RewriteBase base

RewriteBase /catalog/ # base URL which in our case is 'catalog' since this is where our app is sitting

Now we do our Rewrite using the RewriteRule pattern substitution directive - pattern is our Regular Expression which we use to evaluate the incomming URL, substitute is the string which is substituted for (or replaces) the original URL for which Pattern matched.

RewriteRule ^products/[a-zA-Z0-9-_]+/([0-9]+)$ product.php?pid=$1
# End HTACCESS

The Regular Expression:

- Caret ^ and dollar $ sign characters signify the start and end of our pattern string

- Square brackets specify ranges of allowed characters, such as A to Z, 0 to 9

- Round brackets are used to “capture” parts matched in our pattern - in this case the product id - which we later reference in our substitute URL using back referencing $1
(ref numbers are indexed according to each set of round brackets i.e. if we had enclosed the product name match in rounded brackets this would be $1 and the product id would be $2)
Easy enough? This is a relatively simple rewrite and I have explained it in fairly layman terms, more complicated rewrites require some knowledge of Regular Expressions. If you haven’t played with Regular Expression I highly recommend you checkout Wikipedia, DevShed articles and the cheat sheets supplied by ILoveJackDaniels.com.

October 25, 2006

Creating readable URLs with mod rewrite

You will find other articles relevant to this document in these sections:
Richard Lee @ 12:56 am

Ugly URLs…
Querystring style URLs like below are synonymous with dynamic systems

http://www.mysite.com.au/catalog.php?category=cartridges&prod_id=33

While URLs like this are completely functional, there are a few issues with URLs like this you should consider before going live:

  1. The underlying technology is exposed such that script kiddies can inject their own data - such as an sql injection.
  2. The URL contains ampersands which, when unescaped will affect the XHTML compliance of linking websites
  3. Search engines will avoid indexing dynamic pages like this

So without refactoring the application how can you make more freindly URLs? A simple solution for Apache based users is to utilize the mod rewrite module.

mod rewrite to the rescue!

Note: mod_rewrite is not loaded into Apache by default, to do so open your httpd.conf file and uncomment the module load code

Apache’s mod rewrite module can rewrite requested URLs on the fly, meaning you can substitute ugly querystrings with meaningful URLs, which address security issues, compliance and SEO. So how does mod rewrite work?

Basic rewriting

Typically mod rewrite directives are added to a htaccess file in your web root. Here’s a simple example to start with:

RewriteEngine on
RewriteRule ^old_page.html$ new_page.html

This rewrite transparently redirects a request for old_page.html to new_page.html. The first line enables the engine (only required once per htaccess file):

RewriteEngine on

This next line basically tells the server if there is a request matching oldpage.html, then substitute it with new_page.html. The caret ^ and dollar $ sign signify the start and end of the string used for the match.

RewriteRule ^old_page.html$ new_page.html

If you want to actually do a more traditional redirect and show the location of the page in the status bar just add [R] to the end of the RewriteRule.

Easy enough? That was a pretty simple rewrite. The real flexibility of mod rewrite requires some knowledge of Regular Expressions, which can get quite complex. However, the functionality and flexibility Regular Expressions offer in PHP make them well worth learning. So lets take a look at some common rewrites and the expressions used to make them happen.

Unleashing mod rewrite with Regular Expressions

With the help of Regular Expressions we can create RewriteRules which match a set of URLs and redirect them to their actual pages. Consider the products pages in our fictional Shopping Cart app which only vary in category name and product id. We can identify requests for products page by specifically matching the PHP filename, something representing a category name, forward slash, then something representing a product id. And here’s how our rule looks:

RewriteRule ^catalog/([a-zA-Z0-9-])/([0-9]+)/$ /catalog.php?category=$1&prod_id=$2

The parts in square brackets are known as ‘ranges’. In this case where allowing anything in an alphanumeric range (case insensitive) for our category name, then anything in the numeric range for our product id. We’ve also encased these regular expressions in parenthesis so we can ‘back reference’ our matches to pass the values onto our PHP page. Back referencing is done via indexing each set of parenthesises, $1 for our first parenthesises, being our category name, and $2 for our second parenthesises, being our product id.

Well this is the end of my tutorial on how to use mod rewrite to create freindly URLs. If you have any questions please feel free to post a comment. Cheers!

July 25, 2006

Setting up a PHP Development Environment - Part 2: Installing PHP4

You will find other articles relevant to this document in these sections:
Richard Lee @ 1:14 pm

Part 2 assumes you have a working installation of Apache 2.0 please refer to Part 1: Installing Apache if you haven’t already done so.

Requirements

  • Apache 2.0.x

1. Download the PHP 4.3.x Windows Binary zip package from http://www.php.net/downloads.php .
2. Run a virus scan and an MD5 checksum to verify the integrity of the download, then unzip to the same root directory as your Apache installation. I have Apache under C:\Apache2, so I have unzipped PHP to C:\php4 .

3. Now edit your php.ini file:

Rename C:\php4\php.ini-dist it to php.ini

Open the php.ini file, locate doc_root and set it to whatever your Apache DocumentRoot is set to. Mine is doc_root = “C:\public_html”

Note: Remember that when adding path values in the Apache configuration files on Windows, all backslashes such as c:\directory\file.ext must be converted to forward slashes, as c:/directory/file.ext. A trailing slash may also be necessary for directories.

Scroll down to extension_dir = “” and set it to the location of the ext directory of your PHP installation extension_dir = “C:\php4\extensions”

Since where using windows we also have to set the session save path to an existing directory so we can use PHP’s sessions functions. I recommend using the Window’s temporary directory session.save_path = “c:/windows/temp” OR for Win2k session.save_path = “c:/WINNT/Temp”

4. Next edit your Apache Conf file (hppd.conf )

For PHP 4 add the following line and copy your php4apache2.dll file from the sapi directory into the php4 root directory:

LoadModule php4_module "c:/php/php4apache2.dll"
AddType application/x-httpd-php .php

For PHP 5 do something like this:

LoadModule php5_module "c:/php/php5apache2.dll"
AddType application/x-httpd-php .php

And for both configure the path to the php.ini file:

PHPIniDir "C:/php"

5. Now test your installation - first restarting the Apache server - by creating a simple phpinfo file:

< ?php

phpinfo();

?>

Save this file as phpinfo.php in your webroot C:\Apache2\htdocs and run it through your web browser http://localhost/phpinfo.php. You should now be faced with a printout of your servers settings.

If you do not see this page check to make sure you have restarted Apache by right clicking the Apache monitor in your task bar.

You have now completed your installation of PHP 4.

For more information please refer to php.net’s documentation Apache 2.0.x on Microsoft Windows

July 10, 2006

Setting up a PHP Development Environment - Part 1: Installing Apache

You will find other articles relevant to this document in these sections:
Richard Lee @ 5:23 pm

Welcome to Part 1 of our 3 part series, “Setting up a PHP Development Environment”. Please note we will be using a Windows XP environment for our installation. If you have come here in search of an article on how to install PHP please refer to Part2: Installing PHP4.
Requirements :

  • Windows NT i.e. XP (NT 4.0 has some issues with SP 4 please update to SP 6)
  • TCP/IP networking must be installed and working
  • Intel/AMD Processor (for Win Installer)

Installation & Setup
1. Download the Win32 Binary (MSI Installer) for the Apache HTTP Server package from apache.org .
Note: I am using the 2.0.58 release however 2.2.2 is now available.

2. Run a virus check and MD5 checksum to verify the integrity of the download, then run the installer.
2. Within the Installation Wizard enter the following Server Information, making sure you check the checkboox “For All Users, on Port 80, as a Service” at the bottom.
Apache dialog - Enter Server Info

Network Domain: localhost
Server Name: localhost
Admin Email: (your email)

Note: If you get a Windows Firewall prompt, make sure you select UNBLOCK

3. Install to your local drive. Mine is C:/Apache2 and we will assume this directory in the following installment Part 2: Installing PHP.
4. After installation has completed the Apache Monitor will appear in your Windows task bar. Right-click this and select Start from the menu. The server should start loading.

Apache Monitor

5. After the server has started, open your web browser and visit http://localhost. An Apache test page should come up - You have now successfully installed the Apache Server on your machine.

Apache Test Page

Note: If there is an error, check the Apache Monitor to make sure the server is running, if not check your Firewall to make sure the service isn’t blocked.

6. (Optional) Apache defaults your document root to [drive]:/[Apache]/htdocs , if you would like to specify a different root directory open up the httpd.conf file ([drive]:/[Apache]/conf/httpd.conf) and replace the DocumentRoot with the path to your desired directory.

Follow on Part 2: Installing PHP4

For more information on installing Apache please see apache.org’s documentation Using Apache with Microsoft Windows

April 21, 2006

Quick Apache PHP Mysql FTP install

You will find other articles relevant to this document in these sections:
Cameron Manderson @ 10:41 am

Your development environment is important to match the sort of environment that you wish to deploy on. Often the basic infrastructure of a webdevelopment company would have this sort of simplified deployment server environments:

- Development
Development server is usually either a the workstation locally for a developer or may be a development office server that is setup with the required apache environment. It is used so that developers can independantly develop and test without affecting other developers or destroying a client review version of the project.
- Staging
Staging matches the Live infrastructure/environment as close as possible. It may be used for formal testing (often there is a testing server, but sometimes staging is used) and review by the client. It will represent the version of the project before the project is moved to the live servers.
- Live/production
These host the live project in use for its intended purpose.

Often to achieve this process, installing Apache, PHP, MySQL, PhpMyAdmin etc can be quite a hastle - especially on several different machines and environments.
That may quickly make you think how am I going to quickly get my development and staging servers up? Surely that will take weeks of configurations to achieve? Well you would be right to initially feel that way, but you shouldn’t. There are many ‘quick install’ programs out there that allow us to quickly install and configure and Apache web environment instantly.
I have used variations of xampp and it is available in many different flavours for different operating systems. It is free for use. I have xampp running easily under a Win32 and Linux environment and the process is extremely quick and easy. There are also versions for Solaris and MacOSX. It is provided as a ready to go package that provides everything you need to get your testing/development environment up.

It also is very beneficial to PHP developers as it allows a option to switch between PHP 4 and PHP 5 with a simple script that can be run. Great for testing forward/backward compatibility.

You will need to first visit the Xampp Sourceforge File Listing and choose the packages required for your Operating System.

Windows

For windows this comes in two flavours, and two installation methods. At the time of writing this, xampp windows package was upto version 1.5.1.

The distribution for Windows 98, NT, 2000 and XP. This version contains: Apache, MySQL, PHP + PEAR, Perl, mod_php, mod_perl, mod_ssl, OpenSSL, phpMyAdmin, Webalizer, Mercury Mail Transport System for Win32 and NetWare Systems v3.32, JpGraph, FileZilla FTP Server, mcrypt, eAccelerator, SQLite, and WEB-DAV + mod_auth_mysql.

Xampp Windows is provided in a Lite (basic) version with minimal package configuration and a complete version with all the packages. You can either download the package in a self extracting .exe, a ZIP archive or a .exe style installer.

Installing the packages could not be easier, either extract or use the installer to install the Xampp package onto your computer. I like to use “\server\xampp” as an install location and I try to keep it the same on every workstation.

Once Xampp is installed you will want to go to the installation directory and run the Xampp-control. This control panel allows you to easily start and stop the various installed packages, such as FTP/Apache or MySQL. You can also tick the “svc” tickbox which will install xampp to the windows service list (Control Panel -> Administrative Tools -> Services) which will set the services to start automatically when you boot windows.

Then you need to point your web browser to http://localhost/. At this point you will be able to choose your language and perform various install tests to see if everything is running smoothly.

You will need to run the security recommendations immediately and configure your webserver with a password. This is a very important step.

Linux

Linux is very easy to install. At the time of writeing this, xampp linux was also upto version 1.5.1

The distribution for Linux systems (tested for SuSE, RedHat, Mandrake and Debian) contains: Apache, MySQL, PHP & PEAR, Perl, ProFTPD, phpMyAdmin, OpenSSL, GD, Freetype2, libjpeg, libpng, gdbm, zlib, expat, Sablotron, libxml, Ming, Webalizer, pdf class, ncurses, mod_perl, FreeTDS, gettext, mcrypt, mhash, eAccelerator, SQLite and IMAP C-Client.

You will simply need to download the package to you linux /tmp directory. If you are only accessing your server with Putty, and need a way to download the file directly onto your computer from the command line, connect with SSH and perform the following:

cd /tmp
wget http://nchc.dl.sourceforge.net/sourceforge/xampp/xampp-linux-1.5.1.tar.gz

The location of where you need to download the installation can be found by selecting a mirror to download the file from in Sourceforge.

Once that you have downloaded the package, execute the following under root previledge:

tar xvfz xampp-linux-1.5.1.tar.gz -C /opt

This command extracts a GZip Tar file to the /opt location. You install will now reside under /opt/lampp

Once the package has extracted you can now start the server. If you currently have any other services (such as previous Apache/MySQL services installed, this may fail. Turn them off using the appropriate “apachectl stop” or “/etc/init.d/apache stop” commands.

/opt/lampp/lampp start

Before taking any more steps I would recommend immediately running the security option of Lampp to configure the security of the server. This is highly recommended.

/opt/lampp/lampp security

Now you can point your browser to the IP of your linux box (either localhost if you are running the apache under your local computer or the IP on the network).

If you are having problems with your linux isntallation, checkout the Linux FAQ on Apache Friends.

Mac OSX or Solaris
If you installing xampp follow the appropriate guide below:

- Max OSX install read here.
- Solaris install read here.
Security Note: They do not recommend Xampp to be run for live/production. This is to do with security and takes extra configuration to make is secure enough for a live environment.

Once you have installed these packages successfuly it is very quick and easily to replicate the install across different workstations/server environments.

April 7, 2006

PHP Encrypting using PKI/GnuPG

You will find other articles relevant to this document in these sections:
Cameron Manderson @ 2:33 pm

PKI (Public Key Infrastructure) is well known to security buffs. It involves the use of a public key to encrypt something, and can not be reused to decrypt. Instead, a private key kept secure by the intended recipient is used to decrypt. This allows the public key to be freely available online and useless for decrypting messages created with it. Originally (to my knowledge) it was first implemented by a guy who wrote PGP (Pretty Good Privacy). History asside, the PGP is a commercial application and GnuPG is an open source implementation. Both are interchangeable.
Because it is open source we often find it available on Linux hosting. This means that using GnuPG we can encrypt secure messages received by the server (not to the server, that can still be intercepted unless under HTTPS protocol). Keys can have varying strengths (2048bit for example) and have different types (e.g. RSA) with cipher/hash combinations (e.g. AES-256/SHA-2-256). Perfect for making some pretty damn secure messaging.

This requires you to have your public key added to your GnuPG Keychain that the webuser can access. A good example for getting GnuPG installed and having your keychain added is here. You typically can just send your public key chain in an email message to your hosting company and have them add it to their keychain. They will be friendly to add it.

You will need to know the directory to gpg bin on your hosting server, as well as the .gnupg keychain location to specify in your –homedir parameter. Your hosting company again will save you with this one.

So, as a simple example on the usage of GnuPG I will demonstrate by discussing a quick way of encrypting details received by form input:

$prefix = 'enc';
$command = '/usr/bin/gpg --always-trust --batch --no-secmem-warning --homedir /home/www/.gnupg -a -r "Cameron Manderson" -e';
$tmpFile = tempnam('/tmp', $prefix);
$pipe = popen("$command 2>&1 >$tmpFile", 'w');
if (!$pipe) {
unlink($tmpFile);
} else {
fwrite($pipe, $plainTxt, strlen($plainTxt));
pclose($pipe);
$fd = fopen($tmpFile, "rb");
$output = fread($fd, filesize($tmpFile));
fclose($fd);
unlink($tmpFile);
}

The idea behind the code above code is that we form a message assigned to the variable $plainTxt, and have it encrypted by the popen call, then have the encrypted details placed into $output. If you are using this to accept input from a user and encrypt it (such as encrypting credit card details and the like) you will want to ensure you are under a suffice level of HTTPS.
I have purchased a copy of PGP Desktop which allows me under a windows gui environment decrypt and view the contents of a message. This is great for end users because it allows them to easily decrypt a message using windows. It also can integrate into their mail application (such as Outlook or Thunderbird).

March 29, 2006

PHP Web Apps and Scalability

You will find other articles relevant to this document in these sections:
Cameron Manderson @ 9:32 am

PHP Web applications scale very well. When PHP executes it loads your previous sessional variables (if used and if there is any), performs a task, writes your session to disk (if sessions are used) and exits. This is different to a Java Web container which has a process handling requests at all time, with your application running from start to finish.

Due to this “start-process-finish” requests can be processed without worrying greatly about affecting data in other processes. It also means that if our storage is shared (for sessions etc) and we use some intelligent network level routing we can have requests processed by several different servers. When more requests are needed we can simply pipe on more servers, without worrying too much about managing the state across the servers.
When we are using virtual servers for our hosting we must consider about how load is handled. Generally cheaper hosting usually means that it is either hosted in america, using a poor tier of hosting (cheap bulk) or jamming a large amount of domains onto single servers (having a high ratio of servers to domains). All usually results in slower performance of your website.

I have been using a group out of Western Australia for my shared hosting. After doing research into them and their setup it seems very full proof. Damian Douglas-Meyer (a technician there) explains:

Our load balancing system works like this:

1. Today, there are 10 identical Linux servers, each running Apache and ProFTPD for HTTP and FTP respectively. 2 of these are dedicated for FTP although all can do FTP or HTTP.
2. All servers are configured for and can respond for all sites and share a common file system via a NetApp filer.
3. There is a central load balancing switch that listens for the common IP address 203.202.10.111 and initially receives the packets.
4. The load balancer monitors server health and also load, based loosely on the number of current connections to each server. It also remembers client IP addresses that have connected to each server within the immediate past.
5. When the load balancer receives a packet, if possible it passes the request onto the same webserver that processed the requests from that client. This is to keep PHP and other sessions alive. Otherwise it passes the request to the least loaded webserver, modulo some other settings for distributing load.
6. The webserver gets the request as if it came directly from the client due to some network level packet re-writing. It process the request in the same way it would as if it was the only webserver for that site, and returns the data to the client.

So in essence, if 100 people were accessing your site at any one time, 10 of them would be processed by each server. An individual client would stay with the same server for the life of a session.

If one server gets busy due to other clients consuming resources, the load balancer knows this based on response times of its’ heartbeats and reduces the level of new connections to that server.

If a server dies, connections are passed to other servers, although in this situation, PHP sessions can be lost (unless stored in your own tmp directory under your home directory, or in a central database.)

Regarding peaks and troughs in load, there are times when some servers get busy due to specific clients running demanding scripts. We do place limits on memory, CPU and execution time of scripts to mitigate issues with these situations. If we notice some clients abusing the servers with poorly written perl cgi’s, for example, we will work with the customer to improve their script, or quarantine them on a separate server for the good of all customers

This sort of scenario is very appealing to us, due to the way that our PHP applications can be handled by several servers without worrying about scalability issues of a single web container instance. This scenario is good to have if you are internally hosting your applications. If your server become under high load you can simply setup another server in the cluster [although a theoritical bottleneck would come of the storage medium first].
RE the company I use for hosting: Another attractive feature is that they are using a high-grade australian bandwidth, generally meaning that your website will load quickly for australian viewers, and because they are close to the top tier, international traffic is quite good as well. They provide excellent prices (starting from around $180 per year - 500mb, 10GB traffic, unlimited email, and Urchin Webstats [can be automatically emailed to your client every day/week/month]). The accounts are customisable and have the ability to scale only traffic without having to pay for more space - (so you don’t have to fork out money for several gig of space if all you want is lots of traffic). You can get a 5% discount on the price using my referral. Their support is outstanding and I have found it to be a very proffesional way to host our domains. I first came across them because PHP.net uses them as a mirror because of their capability to handle demand. Must be good :-)