Posts Tagged ‘PHP’

MySQL Full-Text Searching

Saturday, March 27th, 2010

Most people* immediately think of the ‘LIKE’ operator when they want to do a quick search in SQL. If you only want to search for a single keyword in a small table, then there is (arguably) nothing wrong with this.

However, when you start wanting to search for multiple terms, search for certain terms while excluding others, or searching large tables, ‘LIKE’ will get slow and very inaccurate.

One major cause of this inaccuracy is that the search terms are looked for in the order they are specified. For example, the following query would match “We use PHP and MySQL to power this website.”, but would not match “We use MySQL and PHP to power this website.”:

SELECT * FROM articles WHERE body LIKE '%php%mysql%' AND body NOT LIKE '%wordpress%';

After discovering the above limitation first-hand (while implementing a search box on my QuoteDB system), I came an alternative: Full-Text matching.

This article on MySQL Full-Text Searching on the Zend DevZone was particularly helpful on this topic, so I recommend you give it a read. For those too lazy to read the entire thing, Full-Text matching requires a single step before you can use it: you must first add a  ‘FULLTEXT’ index to the field(s) you wish search:

ALTER TABLE articles ADD FULLTEXT(body);

Once that’s done, you can perform very fast and accurate searches, including (since MySQL 4.0.1) boolean mode operators to dramatically increase the search power. (For an idea of the power of boolean mode, have a look at the operators table.)

SELECT *, MATCH(body) AGAINST ('+php +mysql -wordpress' IN BOOLEAN MODE) AS score FROM articles
WHERE MATCH(body) AGAINST ('+php +mysql -wordpress' IN BOOLEAN MODE);

Unlike the ‘LIKE’ search, thanks to Full-Text matching, this query would match both of the “We use…” text strings.

* This included myself, until I discovered Full-Text matching.

PHP extract() should not be used

Wednesday, March 3rd, 2010

In one of my recent lectures, the lecturer* told the class to use the PHP extract() function.

I hadn’t previously come accross extract(), so I looked it up… and was horrified (and slightly amused) that he was recommending it.
It was pretty obvious that  using extract() is almost as bad as having register_globals turned on. (See the notes section on the PHP manual for extract())

Unless you’ve specifically taken measures against it, what’s to stop the user adding something like “&isloggedin=true” onto the GET query string or POST data? Or even worse, “&isadmin=true”!

Instead of using the extract() function, what’s wrong with using an array, a quick foreach and some variable variables?
Here is a very quick example (not perfect by any means, but you should get the idea):

/*
* Example of safe variable assignment from the POST data for some login form
*/
$myVars = array("username","password","rememberMe");
foreach ($myVars as $someVar) {
	$formVar_$someVar = $_POST[$someVar];
	//Will result in: $formVar_username, $formVar_password and  $formVar_rememberMe.
 
	//$$someVar = $_POST[$someVar];
	//This could be used to give you $username, $password and $rememberMe
}

The above example stops a malicious user’s extra GET or POST data from actually doing anything, but the resulting variables come out pretty much the same. Adding all your form or URL variables to the array is almost certainly a lot less work (and less of a headache) than it would be to add security measures to your code.


* This is the same lecturer that still uses Netscape Navigator, so it’s hardly surprising that he’s still trying to use code practices that went out of date in the 90′s. He also recently posted this on the Uni LMS:

“I have just discovered that Global Variables has been disabled on the server, which means it will be difficult for you to use session variables to implement your [coursework]. I have requested it be re-enabled [...]“