InnoDB fulltext search with ranking

According to the MySQL manual
Full-text indexes can be used only with MyISAM tables

I’m using an InnoDB table-type however, so had to look at other ways of implementing a full-text search that would give me rankings by relevance, to order the results by.

It’s possible to do the ranking with PHP, by doing a LIKE '%term%' MySQL query and then doing a substring count for each search term, but I believe that doing it via MySQL would be faster, and way cooler.

The algorithm would have to work like this:

  • We have a string that we’re going to search through, let’s call it the haystack, eg. ‘lookandlook‘.
  • The length of this string is 11 characters.
  • We have a string that we’re looking for, let’s call it the needle, eg. ‘look
  • The length of this string is 4 characters.
  • If we count the total number of characters that match when we look for needle in haystack, it will be a multiple of 4.
  • eg. lookandlook = 4 + 4 = 8 characters.
  • If we divide 8 by 4, we get the number of occurences: 8 / 4 = 2.

I don’t know of a method in SQL whereby one can ‘count the total number of characters that match’. So the way we can find this number, is to count the length of the haystack, and then subtract the length of the haystack after we’ve removed all occurences of needle from it.
eg. 11 – 3 = 8;

The SQL query for this search then looks as follows:

SELECT
  name,
    ( LENGTH(haystack) - LENGTH(REPLACE(haystack, needle, '')) ) 
        / CHAR_LENGTH(needle)
AS matches
FROM `table`
GROUP BY `id`
ORDER BY `matches` DESC

The query unfortunately also returns everything that does not match, with ‘matches’ = 0, and also only compares one needle. To match multiple needles, one could add the result of separate matches.

Here is my query that implements multiple needles, and only displays positive matches:

SELECT
  name, matches
  FROM (
    SELECT
      name,
      SUM(
        (( LENGTH(name) - LENGTH(REPLACE(name, 'my', '')) )
          / CHAR_LENGTH('my'))
        +
        (( LENGTH(name) - LENGTH(REPLACE(name, 'guest', '')) )
          / CHAR_LENGTH('guest'))
      ) AS matches
    FROM `places` 
    GROUP BY `id`
  ) results
WHERE (0 < matches)
ORDER BY `matches` DESC

3 Responses to “InnoDB fulltext search with ranking”

  1. gman says:

    Awesome. Interesting read. :)

  2. gamaki says:

    Amazing :)

    I’ve tried and it’s working really fine.

    Have you tried with a huge database ?

    Cheers

  3. Abraham says:

    Thanks @gamaki!

    I’ve only used the query on smaller sets of data, nothing “huge” yet. So if you have a huge dataset to test it on, please let everyone know your results!

Leave a Reply to gman