Data from multiple rows in 1 column

Posted by:

Categories: MySQL

7 Sep 2010

I was building database tables for a project and came across a problem that I’ve tried to ignore in the past: When I have a main table, a second table, and a table linking the two, and the link-table can link a single record in the main table with multiple records in the second table, doing a join would return multiple rows for a single entry in my main table — a row for each link between the main and second tables. But I wanted to retrieve the data in such a way that only a single row is returned for each row in the main table, which contains all the data from the second table. A mouth-full, I know! Keep reading to see examples as well as the answer I’ve found.

GROUP_CONCAT to the rescue

The GROUP_CONCAT function for MySQL allows you to group multiple entries into a single column with a specified delimiter/separator.

The Query

SELECT main_table.*,
GROUP_CONCAT(second_table.name ORDER BY second_table.name ASC SEPARATOR ', ') AS link_table_column
FROM main_table
INNER JOIN link_table ON link_table.main_id = main_table.id
INNER JOIN second_table ON link_table.second_id = second_table.id
GROUP BY main_table.id

What happens when you run the query is the following:

MySQL would find the multiple entries from the second_table that is linked to the main_table via the link_table and group all the names (the column I selected in the example) into a single column (which I called link_table_column), separating them with a comma.

The results would look something like this:

main_table
column 1
main_table
column 2
main_table
column 3
link_table_column
data data data Second Table Entry 1 Name,
Second Table Entry 2 Name,
etc…

Example

Let’s say we have a list of people and every person plays multiple sports, and this data was stored in a database in 3 tables.

The first table is the list of people:

id Name
1 Foo
2 Bar

The second table is a list of sports:

id Name
1 Rugby
2 Football
3 Tennis
4 Karate
5 Cricket
6 Polo
7 Hockey

The third table is the link-table connecting each person with the sports they play:

id people_id sport_id
1 1 1
2 1 4
3 1 6
4 2 1
5 2 2
6 2 7

Normally, if you run a query like :

SELECT p.name, s.name AS sport FROM people AS p
INNER JOIN link_table AS lt ON lt.people_id = p.id
INNER JOIN sport AS s ON lt.sport_id = s.id

Your results would look as follows:

name sport
Foo Rugby
Foo Karate
Foo Polo
Bar Rugby
Bar Football
Bar Hockey

Now if you use the following query which uses the GROUP_CONCAT function :

SELECT people.name
GROUP_CONCAT(sport.name ORDER BY sport.name ASC SEPARATOR ', ') AS sports
FROM people AS p
INNER JOIN link_table AS lt ON lt.people_id = p.id
INNER JOIN sport AS s ON s.id = lt.sport_id
GROUP BY people.name

Your results would look as follows:

name sport
Foo Karate, Polo, Rugby
Bar Football, Hockey, Rugby

Optimization

The query does everything it needs to, but it’s not optimized. To make it a little faster we create two indexes: One for the link-table that contains both the foreign key ids and one for the field being sorted by in the GROUP_CONCAT.

The generic format is as follows:

CREATE INDEX link ON link_table(link_table.main_id, link_table.second_id);
CREATE INDEX sort_name ON second_table(name);

You should create these indexes before you attempt to use the query to improve the speed.

For the people/sport example above the create script for the indexes would look as follows:

CREATE INDEX link ON link_table(people_id,sport_id);
CREATE INDEX sport_name ON sport(name);

In Conclusion

As you can see this saves you the hassle of manually filtering through the extra rows of results returned by a standard join query. The optimization also makes the query faster helping in your overall project performance.

You might also like :

Comment Form

About this blog

Based in South Africa, we're a web-development company...