February 19, 2009
According to the MySQL manual
Full-text indexes can be used only with MyISAM tables
I’m using an InnoDB table-type however, so had to look at other ways of implementing a full-text search that would give me rankings by relevance, to order the results by.
It’s possible to do the ranking with PHP, by doing a
LIKE '%term%' MySQL query and then doing a substring count for each search term, but I believe that doing it via MySQL would be faster, and way cooler.
The algorithm would have to work like this:
- We have a string that we’re going to search through, let’s call it the haystack, eg. ‘lookandlook‘.
- The length of this string is 11 characters.
- We have a string that we’re looking for, let’s call it the needle, eg. ‘look‘
- The length of this string is 4 characters.
- If we count the total number of characters that match when we look for needle in haystack, it will be a multiple of 4.
- eg. lookandlook = 4 + 4 = 8 characters.
- If we divide 8 by 4, we get the number of occurences: 8 / 4 = 2.
I don’t know of a method in SQL whereby one can ‘count the total number of characters that match’. So the way we can find this number, is to count the length of the haystack, and then subtract the length of the haystack after we’ve removed all occurences of needle from it.
eg. 11 – 3 = 8;
The SQL query for this search then looks as follows:
SELECT name, ( LENGTH(haystack) - LENGTH(REPLACE(haystack, needle, '')) ) / CHAR_LENGTH(needle) AS matches FROM `table` GROUP BY `id` ORDER BY `matches` DESC
The query unfortunately also returns everything that does not match, with ‘matches’ = 0, and also only compares one needle. To match multiple needles, one could add the result of separate matches.
Here is my query that implements multiple needles, and only displays positive matches:
SELECT name, matches FROM ( SELECT name, SUM( (( LENGTH(name) - LENGTH(REPLACE(name, 'my', '')) ) / CHAR_LENGTH('my')) + (( LENGTH(name) - LENGTH(REPLACE(name, 'guest', '')) ) / CHAR_LENGTH('guest')) ) AS matches FROM `places` GROUP BY `id` ) results WHERE (0 < matches) ORDER BY `matches` DESC