JumpX Web Development: Essential PHP & JavaScript learning materials with Robert Plank.
Home Tutorials Newsletter   Simple PHP 1 Simple PHP 2 Simple PHP 3 Sales Page Tactics Affiliate BattlePlan  

PHP Tutorials

Article 8: Google Suggest With PHP



Google Suggest is an experiment that tries putting an "autocomplete" into search queries.  This isn't brand new, I've seen this sort of thing done on sites like PHP.net on and off for about a year.  Basically you'd type in the first few letters of a function and it would suggest a bunch to you.

Gavi Narra (http://www.objectgraph.com/dictionary) came up with a cool way of showing how to do this on your own: by making a dictionary suggestion tool.  I liked his demo so much I decided to show how it could be done in PHP.

The first step is getting a copy of the Webster 1913 dictionary.  It's a good one to use because it's out of copyright.  There are other public domain dictionaries you can use like WordNet, but that one might be too hard for you to import into a database because it's made of up of thousands of little individual files.

Webster 1913 is available on Project Gutenberg but I don't like to use that one since they put in line breaks which are a pain to take out.  I prefer this version: http://www.jumpx.com/tutorials/googlesuggest/Webster-1913.gz (right click and save)

If you don't want to wait to download it to your computer, and then upload it to your server, and you have shell access you can just do:

wget http://www.jumpx.com/tutorials/googlesuggest/Webster-1913.gz

If you download this with a "regular" browser it will probably compress the file for you.  You download about 5 megs but the result should be around 30.  If the file looks small then you will have to gunzip it.

So, on your server do this: gunzip *.gz

Now the zip file should be gone, and you will have a new file in there called Webster-1913.

I tried reading this into a variable using the file() function, which puts each line of a file on a different array, but it didn't work.  That's because this file was saved with a Mac, so the end of each line was denoted by a carriage return (\r) instead of a newline character (\n).

This is still easy to get around, first read the whole thing into one huge string...

$file "Webster-1913";
$fp fopen($file"r");
$contents fread($fpfilesize($file));
fclose($fp);


And then explode by the "\r" character.

$contents explode("\r"$contents);

We don't have to trim the whole array now, like we would've with file().

Loop through each of those lines...

foreach ($contents as $line) {
}


And inside that loop, use a regular expression to get the parts of the text we want:

preg_match_all("/<hw>(.*?)<\/hw>.*?<pos>(.*?)<\/pos>.*?<def>(.*?)<\/def>/"$line$resultsPREG_SET_ORDER);

This looks complicated, but it isn't.  If you look at the dictionary file a sample line looks like this:

<p><hw>Ge*ne"va</hw> (?), <pos><i>n.</i></pos> <def>The chief city of Switzerland.</def></p>

The stuff between the <hw> and </hw> tags is the actual word.  The text inside the "pos" tag tells us if the word is a noun, preposition, etc.  Then, the text inside the <def> </def> tags contain the actual definition.

All that went into the regular expression, with the .*? in between them to show we need to skip over that text.  The parentheses around the wildcard means we want to save that text and put that into out result array.

I put .*? in between each tag because there might be other stuff in between, like spaces or maybe alternate definitions which we don't care about.

Now, we're reading from the variable $line and putting the matches into the array called $results.  The PREG_SET_ORDER part at the end structures the array so the first set of matches into $result[0], second set of matches into $result[1], etc.  The "SET" in "PREG_SET_ORDER" isn't the verb "set," it's the noun "set."  Meaning we want to group our matches by each set.

So, if the size of $result[0] is greater than 0, there are matches.  This would be when:

count($matches[0]) > 0

So just to see what comes up, let's print_r() that $matches array when we find that first match, and then die() so that only the first match is shown.  Here's your whole script now:

<?php

set_time_limit
(0);

// Read file to a variable
$file "Webster-1913";
$fp fopen($file"r");
$contents fread($fpfilesize($file));
fclose($fp);

$contents explode("\r"$contents);

foreach (
$contents as $line) {
   
preg_match_all("/<hw>(.*?)<\/hw>.*?<pos>(.*?)<\/pos>.*?<def>(.*?)<\/def>/"$line$resultsPREG_SET_ORDER);
   if (
count($results[0]) > 0) {
      
// This is a dictionary line
      
print_r($results); die();
   }
}

?>


Save that as read.php.

This is better if you can run it from the shell, if that's the case telnet or ssh into your host, browse to the folder and type "php read.php."  If you're loading this from your browser, just view the source after it loads instead of the HTML output.

When I ran this my output was:

Array
(
   [0] => Array
       (
           [0] => <hw>A</hw> (&adot;), <pos><i>prep.</i></pos>
   [Abbreviated form of <i>an</i> (AS. <i>on</i>). See <u>On</u>.] <sn><b>1.</b></sn> <def>In; on; at; by.</def>
           [1] => A
           [2] => <i>prep.</i>
           [3] => In; on; at; by.
       )

)


Since we really only care about the first match, we can show only the contents of $results[0], or...

$info = $results[0];
print_r($info); die();

Array
(
   [0] => <hw>A</hw> (&adot;), <pos><i>prep.</i></pos>
   [Abbreviated form of <i>an</i> (AS. <i>on</i>). See <u>On</u>.] <sn><b>1.</b></sn> <def>In; on; at; by.</def>
   [1] => A
   [2] => <i>prep.</i>
   [3] => In; on; at; by.
)


Even better: we don't need that first element, which is just the whole match.

$info $results[0];
array_shift($info);
print_r($info); die();


Array
(
   [0] => A
   [1] => <i>prep.</i>
   [2] => In; on; at; by.
)


Now go into phpMyAdmin and make a new mySQL table.  Don't forget to add an index for the "word" field.  Run this query and you'll have an exact copy of my table:

CREATE TABLE `dict_list` (
 `id` int(11) NOT NULL auto_increment,
 `word` varchar(255) NOT NULL default '',
 `type` varchar(50) NOT NULL default '',
 `definition` text NOT NULL,
 KEY `id` (`id`),
 KEY `word` (`word`)
) TYPE=MyISAM AUTO_INCREMENT=1 ;


There are still a few tiny things to be done, but I'll just show you the changes I've made:

<?php

mysql_connect
("localhost""your_mysql_user""your_mysql_password");
mysql_select_db("your_mysql_database");

set_time_limit(0);

// Read file to a variable
$file "Webster-1913";
$fp fopen($file"r");
$contents fread($fpfilesize($file));
fclose($fp);

$contents explode("\r"$contents);

$i 0;

mysql_query("TRUNCATE dict_list") or die(mysql_error());

foreach (
$contents as $line) {
   
preg_match_all("/<hw>(.*?)<\/hw>.*?<pos>(.*?)<\/pos>.*?<def>(.*?)<\/def>/"$line$resultsPREG_SET_ORDER);
   if (
count($results[0]) > 0) {
      
// This is a dictionary line
      
$info $results[0];
      
array_shift($info);

      list(
$word$type$definition) = $info;
      
$word preg_replace("/[\*\"\'\|\`-]/"""$word);

      
$word addslashes($word);
      
$type addslashes($type);
      
$definition addslashes($definition);

      
mysql_query("INSERT INTO dict_list SET word = '$word', type = '$type', definition = '$definition'") or die(mysql_error());

      echo 
$i++ . " ";
   }
}

?>


First of all, I truncated the table before the build was done, which clears out the whole table.  That way if you run the read.php script more than once, you won't get any duplicates.

Then, this line:

list($word$type$definition) = $info;

Puts the 0th element of $info into $word, the 1st element into $type, and the 2nd into $definition.  This just puts these items into their own variables so what we do is a bit more readable.

$word preg_replace("/[\*\"\'\|\`-]/"""$word);

Next, we remove all the pronunciation characters you would have seen in the words if you had taken a peek at the file.  Stuff like an asterisk, double quote, single quote, pipe, backquote, and so on, all removed so we're left with the regular word.

$word addslashes($word);
$type addslashes($type);
$definition addslashes($definition);


Slashes added to each of those variables to prevent SQL injection attacks.  If the word we were importing into the database was "it's", the first part of the query would look like:

INSERT INTO dict_list SET word = 'it's'

This is confusing, because mySQL doesn't know where we want to end the string.  Adding the slashes makes it look like this:

INSERT INTO dict_list SET word = 'it\'s'

And it says okay, I see \' instead of just ' so that means you want a single quote, you don't want to mark the end of that string.

Finally do the query that adds this line into the database table dict_list:

mysql_query("INSERT INTO dict_list SET word = '$word', type = '$type', definition = '$definition'") or die(mysql_error());

And finally that echo statement in there is just for me to tell when each entry is being added, as sort of a progress indicator so I don't get bored waiting.

So, run that script, preferably in the shell, but in the browser it should work okay.  It takes a couple of minutes but will import 110,000 dictionary entries into your database.  With the "word" column indexed so it can be retrieved quickly -- this is important.

Next you have to put together a very simple script that will search the table based on a query.  This is really easy, done in about 20 lines of code here:

<?php

mysql_connect
("localhost""your_mysql_user""your_mysql_password");
mysql_select_db("your_mysql_database");

$q addslashes($_GET["q"]);
$limit 10;

if (
$q) {
   
$query mysql_query("SELECT * FROM dict_list WHERE word LIKE '$q%' LIMIT $limit") or die(mysql_error());

   
$results = array();

   while (
$row mysql_fetch_assoc($query)) {
      
$word $row["word"];
      
$definition $row["definition"];
      
$type $row["type"];

      
$results[] = "<b>$word</b>: $type $definition";
   }

   echo 
implode("<br>\n"$results);
}

?>


First, we connect to the database.  Then add those slashes to the query (the script is called in the form of "find.php?q=apples", for example).

In mySQL you can use LIKE as a simple search.  Use the percent sign as a wildcard.  By putting the percent sign at the end it means we'll use that query as the START of the word we're looking for.  If "nap" is given as a query, a word like "napkin" could be suggested, because it begins with "nap".  But "snap" wouldn't be suggested, since even though "nap" contains part of that word, it doesn't start with "nap"... get it?

Oh yeah, and it's sorted alphabetically by the "word" field and limited to 10 rows.  Always put a limit on your queries when you can.

Good.  Then that while loop adds each row onto an array.  I like to put my queries into an array, that way I can do stuff with it later if I want to.  In this case it's also useful because I want to separate each result with a line break but I don't want to have a line break at the end.  It just comes out cleaner than adding things on to the end of a string.

And it outputs that text.  Now try this out.  Upload find.php onto your server, edit the settings so it connects to your own mySQL database with your mySQL user, and try a URL like this:

http://www.example.com/yourfolder/find.php?q=goo

It will give you the first 10 words starting with "goo."  Now let's get to work on the HTML side of things.

I'm going to make a simple HTML file called "index.html" like this:

<div align="center">
<
form action="find.php" method="GET" target="searchWindow">
<
input autocomplete="off" type="text" name="q" size="20"><input type="submit" value="Search">
</
form>

<
iframe name="searchWindow" src="find.php" width="500" height="300"></iframe>
</
div>


Very basic, a text box and search button with an inline frame below it.

(That autocomplete="off" *is* important, though.)  The form submits into the inline frame, it doesn't reload the current page.  Whatever you type into the search box is passed to the script as a parameter, but you have to hit the search button... it doesn't really autocomplete for you.

That's easy, all you have to do is add an "onkeyup" JavaScript event to that search box to re-submit the form any time the text has changed.  That HTML file becomes:

<div align="center">
<
form action="find.php" method="GET" target="searchWindow">
<
input autocomplete="off" type="text" name="q" size="20" onkeyup="this.form.submit()">
</
form>

<
iframe name="searchWindow" src="find.php" width="500" height="300"></iframe>
</
div>


If that's not cool I don't know what is.  We can make this even cooler by using XMLHttpRequest, making it look like Google Suggest.  (link: http://www.google.com/webhp?complete=1&hl=en)

Google Suggest uses XMLHttpRequest to load the contents of a URL into a variable, and then writes that to a DIV layer, instead of using an iframe.  It gives everything a more built-in feeling.  Here's that page made to look more like Google Suggest:

<html>
<
head>

<
style type="text/css">
body font-family:TahomaVerdanafont-size:11px; }
</
style>

<
script language="JavaScript">
<!--
var 
req;

function 
loadXMLDoc(url) {

   
// Internet Explorer
   
try { req = new ActiveXObject("Msxml2.XMLHTTP"); }
   catch(
e) {
      try { 
req = new ActiveXObject("Microsoft.XMLHTTP"); }
      catch(
oc) { req null; }
   }

   
// Mozailla/Safari
   
if (!req && typeof XMLHttpRequest != "undefined") { req = new XMLHttpRequest(); }

   
// Call the processChange() function when the page has loaded
   
if (req != null) {
      
req.onreadystatechange processChange;
      
req.open("GET"urltrue);
      
req.send(null);
   }
}

function 
processChange() {
   
// The page has loaded and the HTTP status code is 200 OK
   
if (req.readyState == && req.status == 200) {

      
// Write the contents of this URL to the searchResult layer
      
getObject("searchResult").innerHTML req.responseText;
   }
}

function 
getObject(name) {
   var 
ns4 = (document.layers) ? true false;
   var 
w3c = (document.getElementById) ? true false;
   var 
ie4 = (document.all) ? true false;

   if (
ns4) return eval('document.' name);
   if (
w3c) return document.getElementById(name);
   if (
ie4) return eval('document.all.' name);
   return 
false;
}


window.onload = function() {
   
getObject("q").focus();
}

// -->
</script>
</head>

<body>

<div align="center">

<h1 align="center">Dictionary</h1>

<div align="center">Type in part of a word to have it defined.</div>

<form action="find.php" method="GET" target="searchWindow">
<input autocomplete="off" type="text" name="q" id="q" size="20" onkeyup="loadXMLDoc('find.php?q='+this.value)"
style="width:300px;">
<div align="left" id="searchResult" name="searchResult" style="font-family:Arial; font-size:12px; width:300px;
border:#000000 solid 1px; padding:3px; "></div>
</form>

</div>

</body>
</html>


The loadXMLDoc() and processChange() functions are based on code from Apple Developer Connection (http://developer.apple.com/internet/webcontent/xmlhttpreq.html).  I've changed them a bit (Apple's way was less compatible when ActiveX was turned off in IE, Google's way works better).

The code only looks weird because Internet Explorer and Mozilla/Safari handle this in different ways (go figure).  In my first Simple PHP book I made a little JavaScript-based form mailer that actually called a PHP script using an Image object.

The way we do this is pretty much the same... just give the XMLHttpRequest object info like what URL to connect to, and then say, once it's loaded, pass it to a function.

req.onreadystatechange processChange;

This is a lot like the "onload" property of an Image object.  Anyway, all the processChange() function does is, if the page loaded correctly, populates the "searchResult" layer (the div tag we're putting the dictionary suggestions on).  Just like with the iframe, it's updated on each keystroke, unless it's cached of course, but that isn't the point.

Google Suggest obviously didn't take Kevin Gibbs a long time to implement, but this technique is an easy way to make a web UI look more like a desktop UI... it falls just outside the event horizon of the "Because I Can" category.

Demo here: http://www.jumpx.com/tutorials/googlesuggest/demo.html

Download: http://www.jumpx.com/tutorials/googlesuggest/googlesuggest.zip
Webster1913 Database: http://www.jumpx.com/tutorials/googlesuggest/Webster-1913.gz (right click and save)

Article by Robert Plank

Experienced PHP/JavaScript Tutor
Solves 19 Of Your Most Frustrating
Direct Response Sales Page Hang-Ups
http://www.salespagetactics.com/Your_Clickbank_ID

(The above article may be copied
as long as this resource box is included)

JumpX Web Development - www.jumpx.com