Freezing” WordPress

While a visiting student in Canada, I blogged occasionally (in German) to let my family and friends know what I was doing abroad. I used Wordpress for that purpose, which was set up as a farewell gift by a friend (who also happens to be my webhost). When I launched my personal website, I decided to use Drupal but still have the old blog up in a “read-only” state.

As time went by, the Wordpress installation aged and security concerns arose due to the inevitable discovery of bugs in popular software. Spammers also occasionally suceeded in bypassing the blog’s Akismet spam protection. Turning comments off was easy, but I didn’t want to go through the pain of upgrading WordPress, especially because I wasn’t even using it anymore. So I decided to “freeze” the WordPress blog, turning it into a static collection of HTML pages.

1. I disabled searching and commenting. All “dynamic features” that require any sort of “intelligence” or a database have to be turned off.

2. I used the program wget with the options --mirror and --convert-links to create a local mirror of the blog (wget -mk blog.uweschmidt.org for short). This causes wget to recursively follow all links on the site and download all necessary files (--mirror) and convert all links so that they work when browsing locally (--convert-links). I’ll refer to Create a mirror of a website with Wget and Website Mirroring With wget for more explanations and other uses of wget.

I ended up with all the content pages for the blog, but wget missed the stylesheet (and hence referenced files within). Apparently, wget only deals with HTML code and thus all referenced files in CSS code weren’t downloaded. I downloaded the missing files manually and put them in the appropriate folders.

<style type="text/css" media="screen">
  @import url( http://blog.uweschmidt.org/wp-content/themes/benevolence/style.css );
</style>

3. Then came the somewhat tricky part, because my friend hadn’t set up the blog with clean URLs. He left the default setting, blog posts being accesssed like blog.uweschmidt.org/?p=49, instead of something like blog.uweschmidt.org/title-name. A page with the URL blog.uweschmidt.org/?p=49 had been downloaded as index.html?p=49. That is a problem because index.html?p=49 means that index.html will be called with the parameter p=49. This doesn’t work with static HTML files.

The problem can be solved by using a rewrite engine. I used mod_rewrite because my website is running on the Apache webserver like most other websites on the Internet. To pick up the above example, I renamed index.html?p=49 to p-49.html and added the following rule to an .htaccess file:

RewriteCond %{QUERY_STRING} ^([^&]+)=([^&]+)$
RewriteRule index.html %1-%2.html [L]

A request to index.html?p=49 or ?p=49 on the mirrored blog will be redirected to p-49.html without the user knowing it. Note that existing links don’t break and all pages remain valid in the index of search engines. The “frozen” blog is available at blog.uweschmidt.org, and it looks like a regular Wordpress blog at first sight.

Here’s the complete .htaccess file, taking care of the all required files:

DirectoryIndex index.html
RewriteEngine on
RewriteBase /

# index.html, one parameter
RewriteCond %{QUERY_STRING} ^([^&]+)=([^&]+)$
RewriteRule index.html %1-%2.html [L]

# comment feeds for posts
RewriteCond %{QUERY_STRING} ^feed=rss2&p=([^&]+)$
RewriteRule index.html feed-rss2_p-%1.html [L]

# trackbacks
RewriteCond %{QUERY_STRING} ^p=([^&]+)$
RewriteRule wp-trackback.php tb_p-%1.html [L]

# xmlrpc
RewriteCond %{QUERY_STRING} ^rsd$
RewriteRule xmlrpc.php xmlrpc-rsd.html [L]

4 comments