web development
I was building database tables for a project and came across a problem that I’ve tried to ignore in the past: When I have a main table, a second table, and a table linking the two, and the link-table can link a single record in the main table with multiple records in the second table, doing a join would return multiple rows for a single entry in my main table — a row for each link between the main and second tables. But I wanted to retrieve the data in such a way that only a single row is returned for each row in the main table, which contains all the data from the second table. A mouth-full, I know! Keep reading to see examples as well as the answer I’ve found.
The GROUP_CONCAT function for MySQL allows you to group multiple entries into a single column with a specified delimiter/separator.
SELECT main_table.*, GROUP_CONCAT(second_table.name ORDER BY second_table.name ASC SEPARATOR ', ') AS link_table_column FROM main_table INNER JOIN link_table ON link_table.main_id = main_table.id INNER JOIN second_table ON link_table.second_id = second_table.id GROUP BY main_table.id
What happens when you run the query is the following:
MySQL would find the multiple entries from the second_table that is linked to the main_table via the link_table and group all the names (the column I selected in the example) into a single column (which I called link_table_column), separating them with a comma.
The results would look something like this:
| main_table column 1 |
main_table column 2 |
main_table column 3 |
link_table_column |
|---|---|---|---|
| data | data | data | Second Table Entry 1 Name, Second Table Entry 2 Name, etc… |
Let’s say we have a list of people and every person plays multiple sports, and this data was stored in a database in 3 tables.
The first table is the list of people:
| id | Name |
|---|---|
| 1 | Foo |
| 2 | Bar |
The second table is a list of sports:
| id | Name |
|---|---|
| 1 | Rugby |
| 2 | Football |
| 3 | Tennis |
| 4 | Karate |
| 5 | Cricket |
| 6 | Polo |
| 7 | Hockey |
The third table is the link-table connecting each person with the sports they play:
| id | people_id | sport_id |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 4 |
| 3 | 1 | 6 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 2 | 7 |
Normally, if you run a query like :
SELECT p.name, s.name AS sport FROM people AS p INNER JOIN link_table AS lt ON lt.people_id = p.id INNER JOIN sport AS s ON lt.sport_id = s.id
Your results would look as follows:
| name | sport |
|---|---|
| Foo | Rugby |
| Foo | Karate |
| Foo | Polo |
| Bar | Rugby |
| Bar | Football |
| Bar | Hockey |
Now if you use the following query which uses the GROUP_CONCAT function :
SELECT people.name GROUP_CONCAT(sport.name ORDER BY sport.name ASC SEPARATOR ', ') AS sports FROM people AS p INNER JOIN link_table AS lt ON lt.people_id = p.id INNER JOIN sport AS s ON s.id = lt.sport_id GROUP BY people.name
Your results would look as follows:
| name | sport |
|---|---|
| Foo | Karate, Polo, Rugby |
| Bar | Football, Hockey, Rugby |
The query does everything it needs to, but it’s not optimized. To make it a little faster we create two indexes: One for the link-table that contains both the foreign key ids and one for the field being sorted by in the GROUP_CONCAT.
The generic format is as follows:
CREATE INDEX link ON link_table(link_table.main_id, link_table.second_id); CREATE INDEX sort_name ON second_table(name);
You should create these indexes before you attempt to use the query to improve the speed.
For the people/sport example above the create script for the indexes would look as follows:
CREATE INDEX link ON link_table(people_id,sport_id); CREATE INDEX sport_name ON sport(name);
As you can see this saves you the hassle of manually filtering through the extra rows of results returned by a standard join query. The optimization also makes the query faster helping in your overall project performance.
Based in South Africa, we're a web-development company...
(3)
(1)
(5)
(4)
(1)
(14)
(1)
(8)
(7)
(17)
(4)
(2)
(2)
(1)