May 262011
 

This was not an easy task! For one thing, my Community Server (CS) site was not functional, so using RSS / MetaWeblog endpoints were not available options for me. Secondly, I no longer have a Windows development machine. Since CS is built on all Microsoft technologies, I needed to fire up a virtual instance of Windows in order to extract any of the data. If my previous hosting service was able to keep my database online for longer than minutes at a time, I could have run things remotely, but… not the case.

The actual SQL code for extracting all of my blog posts looks surprisingly simple:

But if you look closely at it, you’ll see that there are to scalar functions in there: ‘dbo.old_url‘ & ‘dbo.make_slug‘. I was surprised to find not find any slugs in the CS DB tables. I assume that all of that logic is being handled from the compiled ASP.NET application itself because there was nothing in the tables, stored procedures, or even functions that did anything related to calculating/parsing URL slugs from post titles. To make matters worse, since my site was not in a running state (due to hosting shenanigans), I had basically just my memory along with the 404 logs on the new WordPress site to help me reverse engineer the rules for converting titles to slugs. This is best represented in my ‘dbo.make_slug’ snippet below:

CREATE FUNCTION [dbo].[make_slug]
(
	@post_title nvarchar(256)
)
RETURNS nvarchar(500)
AS
BEGIN
	-- Declare the return variable here
	DECLARE @slug nvarchar(500)
	DECLARE @clean_title nvarchar(500)
	
	
	SET @clean_title = LOWER(dbo.deDupeSpaces(dbo.removePunctuation(@post_title)))
	SET @slug = REPLACE(@clean_title, ' ', '-')
	
	RETURN @slug

END

And that is used by ‘dbo.old_url‘ here:

CREATE FUNCTION [dbo].[old_url]
(
	@post_date datetime,
	@post_title nvarchar(256)
)
RETURNS nvarchar(500)
AS
BEGIN
	-- Declare the return variable here
	DECLARE @url nvarchar(500)
	DECLARE @y_m_d nvarchar(10)
	DECLARE @clean_title nvarchar(500)
	
	SET @y_m_d = CONVERT(nvarchar, @post_date, 111)
	SET @url = '/archive/' + @y_m_d + '/' +  + dbo.make_slug(@post_title) + '.aspx'
	
	RETURN @url

END

There are still two more functions remaining (if you have been paying attention) that are used by ‘dbo.make_slug‘ and that is where the real fun comes in. First of these is the simpler ‘dbo.deDupeSpaces‘ which cuts all repeating space characters down to a single space:

CREATE FUNCTION [dbo].[deDupeSpaces] 
(
	@input nvarchar(500)
)
RETURNS nvarchar(500)
AS
BEGIN
    /**
    *  Based on Nigel Rivett's SQL script found: 
    *    http://www.nigelrivett.net/SQLTsql/RemoveNonNumericCharacters.html 
    */
	DECLARE @i int

	set @i = patindex('%[ ][ ]%', @input)
	while @i > 0
	begin
		set @input = replace(@input, '  ', ' ')
		set @i = patindex('%[ ][ ]%', @input)
	end

	RETURN @input

END

And the more impressive and pretty much identical to the script I found originally written by Nigel Rivett:

CREATE FUNCTION [dbo].[removePunctuation] 
(
	@input nvarchar(500)
)
RETURNS nvarchar(500)
AS
BEGIN
	/**
	 *  Based on Nigel Rivett's SQL script found: 
	 *    http://www.nigelrivett.net/SQLTsql/RemoveNonNumericCharacters.html 
	 */
	DECLARE @i int

	set @i = patindex('%[^a-zA-Z0-9 ]%', @input)
	while @i > 0
	begin
		set @input = replace(@input, substring(@input, @i, 1), '')
		set @i = patindex('%[^a-zA-Z0-9 ]%', @input)
	end

	-- Return the result of the function
	RETURN @input

END

So all of this so far is just to get my posts out of the CS DB in a format close enough to what I’ll need to stuff into my WordPress DB. In order to continue, I just ran the simple query (snippet at the top) and exported the results to an XML file. Now I could finally shutdown the virtual instance of Windows 7 that was eating up my MacBook’s resources and burning my lap from the CPU pegging. 😉

The rest is pretty straight forward. I was unable to find any WordPress Plugins so to assist me in this completely custom hackery, so I thought a brute force insert directly into my WordPress mySQL DB was a great idea. I first imported the XML file into a new table that I called cs_posts. This table’s structure is identical to the original query used to export it. Once this was done, I built a basic INSERT INTO …  SELECT query to import these CS posts directly into my WordPress posts table:

INSERT INTO wp_xxxxx_posts 
	(post_author, 
	post_date, 
	post_date_gmt, 
	post_content, 
	post_title, 
	post_status, 
	post_name, 
	post_modified, 
	post_modified_gmt, 
	guid) 
SELECT 2 AS post_author, 
	cs_posts.PostDate AS post_date, 
	cs_posts.PostDate AS post_date_gmt, 
	cs_posts.FormattedBody AS post_content, 
	cs_posts.Subject AS post_title, 
	'draft' AS post_status, 
	cs_posts.slug AS post_name, 
	cs_posts.PostDate AS post_modified, 
	cs_posts.PostDate AS post_modified_gmt, 
	cs_posts.old_url AS guid
FROM cs_posts

From this point, all that was required was for me to correct any permalinks that did not match up to the slug I had calculated. But I also wanted to get 301 redirects in place for all incoming requests looking for /archive/YYYY/MM/DD/some-post-title-slug.aspx to find their way to the new URL /YYYY/MM/some-post-title-slug. This was much easier than I anticipated due to the luxury of John Godley‘s Redirection plugin. This gem of a plugin makes my introduction to the WordPress ecosystem a dream come true. In fact, after I set it up on both this site and my root: robnrob.com site, I was able to populate the redirection item table his plugin uses to skip the need to enter in each post’s specific redirection. The plugin also has an option for regex-ish pattern matching, but a lot of the permalinks I ended up with on WordPress would not directly transpose from the basic:

url pattern: /archive/(d+)/(d+)/(d+)/([a-zA-Z0-9_-]+).aspx
redirect to: http://robbie.robnrob.com/$1/$2/$4

In the end, I lost out on previous comments, categories, and tags, but what I gained was a much more reliable hosting environment and a much more enjoyable platform to hack on. Also, to be honest, I had only a handful of comments anyway. 😉

Sorry, the comment form is closed at this time.