web development
According to the MySQL manual
Full-text indexes can be used only with MyISAM tables
I’m using an InnoDB table-type however, so had to look at other ways of implementing a full-text search that would give me rankings by relevance, to order the results by.
It’s possible to do the ranking with PHP, by doing a LIKE '%term%' MySQL query and then doing a substring count for each search term, but I believe that doing it via MySQL would be faster, and way cooler.
The algorithm would have to work like this:
I don’t know of a method in SQL whereby one can ‘count the total number of characters that match’. So the way we can find this number, is to count the length of the haystack, and then subtract the length of the haystack after we’ve removed all occurences of needle from it.
eg. 11 – 3 = 8;
The SQL query for this search then looks as follows:
SELECT
name,
( LENGTH(haystack) - LENGTH(REPLACE(haystack, needle, '')) )
/ CHAR_LENGTH(needle)
AS matches
FROM `table`
GROUP BY `id`
ORDER BY `matches` DESC
The query unfortunately also returns everything that does not match, with ‘matches’ = 0, and also only compares one needle. To match multiple needles, one could add the result of separate matches.
Here is my query that implements multiple needles, and only displays positive matches:
SELECT
name, matches
FROM (
SELECT
name,
SUM(
(( LENGTH(name) - LENGTH(REPLACE(name, 'my', '')) )
/ CHAR_LENGTH('my'))
+
(( LENGTH(name) - LENGTH(REPLACE(name, 'guest', '')) )
/ CHAR_LENGTH('guest'))
) AS matches
FROM `places`
GROUP BY `id`
) results
WHERE (0 < matches)
ORDER BY `matches` DESC
Based in South Africa, we're a web-development company...
(2)
(3)
(1)
(5)
(4)
(1)
(1)
(14)
(1)
(8)
(7)
(17)
(1)
(4)
(2)
(2)
(1)
3 Responses to InnoDB fulltext search with ranking
gman
February 19th, 2009 at 4:01 pm
Awesome. Interesting read. :)
gamaki
July 10th, 2010 at 12:52 pm
Amazing :)
I’ve tried and it’s working really fine.
Have you tried with a huge database ?
Cheers
Abraham
July 12th, 2010 at 11:50 am
Thanks @gamaki!
I’ve only used the query on smaller sets of data, nothing “huge” yet. So if you have a huge dataset to test it on, please let everyone know your results!