old->new site mapping
Two years ago while working at Sea Grant, Dan and I were creating the new mdsg site. Sea Grant had hundreds of pages of existing content in static html. The main concern was that many in the community had bookmarked alot of the content and when we launched the new site, those bookmarks would generate 404's (a very unhelpful content dead end).
Since we were porting the exact same content into a database served by a PHP CMS, we devised a fairly elegant (by "elegant", I mean simple) solution.
At launch time, we moved the old site to a hidden URL as a reference. This was so the script that we'd create could reference the old site, parse the title, search the current db of titles and return a list of suggested pages they were looking for. When a person chose the correct page, it would map that choice to the requested url and learn what page went where. The next time a person hit that link, the error page would forward them to the mapped page.
Of course, we knew that this kind of relied on the person choosing the correct link in the first place, but it was close enough for government work (being an educational institution) and the site had a search feature to help them should they be lead astray.
Click to see Code
Tell Apache to go to a custom errorpage.php page which has the following code:
--errorpage.php--
include_once("includes/config.php");
//see if the site is setup for transistiong from an old version of the site //$_SESSION[old_site_url] is set in the config file- this is were the old copy of the site can be accessed //$_SESSION[enable_old_site_matching] is a boolean var I've set up to see if we're running this site mapping script
if($_SESSION[enable_old_site_matching]==1 && $_SESSION[old_site_url]) { $oldsitepath = str_replace("http://".$_SERVER['HTTP_HOST'],$_SESSION[old_site_url],$_SERVER['SCRIPT_URI']); //see if this page has been mapped already $query = "SELECT * FROM site_old_new_mapping WHERE old_link = '$oldsitepath' "; $result = mysql_query($query,$_GET[dbh]) OR die(mysql_error()); $row=mysql_fetch_assoc($result); if($row[sm_id]) { if($row[page_id]) { //we found an entry, now send them to the previously mapped page header("HTTP/1.1 301 Moved Permanently"); header("Location: /goto.php?page_id=$row[page_id]"); } } else //if this page hasn't been mapped yet, add it! { $INquery = "INSERT INTO site_old_new_mapping(old_link) VALUES('$oldsitepath')"; mysql_query($INquery,$_GET[dbh]) OR die(mysql_error()); //save this ID for the goto.php, where we will map this person's selection $sitemap_id = mysql_insert_id($_GET[dbh]); } //read the old page, from the old site, into an array $phile = @file($oldsitepath); if(is_array($phile)) { $page = implode('',$phile); $title = pagesTextBetween(":", "",pagesTextBetween("title>", "
//function pagesTextBetween(); located in an included file of your choice
function pagesTextBetween($from,$to,$content,$firstoccur=1,$allownomatch=1,$debug=0) { $content = str_replace("r","",stripslashes($content)); $contentcheck = $content; $from = str_replace("r","",$from); $to = str_replace("r","",$to); $from = strtolower($from); $to = strtolower($to); $L1 = strlen($from); $contentcheck = strtolower($contentcheck); if($L1>0) { if($firstoccur==1) $pos1 = strpos($contentcheck,$from); else $pos1 = strrpos($contentcheck,$from); } else { $pos1=0; } if(isset($pos1)) { if($to == '') return substr($content,$pos1+$L1); if($firstoccur) $pos2 = strpos(substr($contentcheck,$pos1+$L1),$to); else $pos2 = strrpos(substr($contentcheck,$pos1+$L1),$to); if(isset($pos2)) return substr($content,$pos1+$L1,$pos2); } else { if($allownomatch==1) return $content; } }
} else $nosuchfile = 1; if($title) { //perform search $searchwords = str_replace(""",""",$title); $query = "SELECT page_title,page_url,page_id FROM page WHERE (MATCH(page_title) AGAINST('$title')) GROUP BY page_id"; $result = mysql_query($query) OR die(mysql_error()); $num = mysql_num_rows($result);
if($num < 1) { $note = "
There are no items that fit your search. Try using different search words."; } else { $cnote = 1; $row[page_title] = stripslashes($row[page_title]); $row[page_summary] = stripslashes($row[page_summary]); while($row=mysql_fetch_array($result)) { $possibilities[] = "
$row[page_title] ($row[page_url])
"; } } } }
?>
--body of page--
I'm sorry, the page you're looking for ( echo $_SERVER['SCRIPT_URI'] ?>) no longer exists in that location. if(is_array($possibilities)) { ?>
Here are some possible matches:
echo implode('',$possibilities); ?>
} elseif($nosuchfile) { ?>
The page also doesn't seem to appear to be in our old site. Check that you have typed in the correct address. } ?>
Here is the goto.php page that inserts the selected page into the site mapping table:
--goto.php page--
//goto.php //takes a page_id, looks up the url path and sends them on their way include_once("includes/config.php"); if($_GET[page_id] || $_GET[page]) { $page_id = $_GET[page_id]; if($_GET[anchor]) $extra = "#$_GET[anchor]"; if($_GET[queri]) $queri = "?" . base64_decode(urldecode($_GET[queri])); //get page location if($_GET[page_id]) $query = "SELECT page_url, external_url FROM page WHERE page_id = '$page_id' "; elseif($_GET[page]) $query = "SELECT page_url, external_url FROM page WHERE page_url LIKE '%$_GET[page]%' "; $result = mysql_query($query,$_GET[dbh]) OR die(mysql_error()); $row=mysql_fetch_assoc($result); if(is_array($row)) { if($_SESSION[user_level]<2) $url = "$_GET[url]/$row[page_url]/index.php"; else $url = "$_GET[url]/$row[page_url]/";
//check if this is sitemapping link. If so, insert the selection if($_GET[updatemapping]) { //update URL in site_old_new_mapping table $Uq = "UPDATE site_old_new_mapping SET page_id = $_GET[updatemapping] "; mysql_query($Uq,$_GET[dbh]); }
if($row[external_url]>'') header("Location: $row[external_url]".$queri.$extra); else header("Location: $url".$queri.$extra); } else header("Location: /index.php?page_id=$page_id".$queri.$extra); } ?>
Here is the DB table for learning the site page mapping:
--MySQL table `site_old_new_mapping`--
CREATE TABLE `site_old_new_mapping` ( `sm_id` bigint(20) NOT NULL auto_increment, `old_link` varchar(200) NOT NULL default '', `page_id` bigint(20) NOT NULL default '0', `manual_link` varchar(200) NOT NULL default '', `ignored` tinyint(1) NOT NULL default '0', `pages_with_old_link` varchar(255) NOT NULL default '', PRIMARY KEY (`sm_id`), KEY `old_link` (`old_link`), KEY `page_id` (`page_id`), KEY `ignore` (`ignored`), KEY `pages_with_old_link` (`pages_with_old_link`), KEY `manual_link` (`manual_link`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COMMENT='maps 404 pages to existing pages' AUTO_INCREMENT=31415 ;
This is all procedural and has no real error handling, but it illustrates a possible solution to a common site upgrade issue. This was robust enough to handle hundreds of pages to be mapped permanently.
|