MySQL Outer Joins Accross Multiple Tables
This is the third time in as many months that I’ve had to read the MySQL Reference Manual section on JOIN Syntax, so it’s time to summarize.
For example, let’s say you have tables:
books
+----+-----------------------------------------+-----------+ | id | title | is_unread | +----+-----------------------------------------+-----------+ | 1 | Ulysses | 1 | | 2 | The Great Gatsby | 0 | | 3 | A Portrait of the Artist as a Young Man | 1 | | 4 | Lolita | 1 | | 5 | Brave New World | 0 | +----+-----------------------------------------+-----------+
tags
+----+------------+ | id | name | +----+------------+ | 1 | classic | | 2 | fiction | | 3 | irish | | 4 | dystopia | | 5 | literature | +----+------------+
books_tags
+---------+--------+ | book_id | tag_id | +---------+--------+ | 1 | 2 | | 1 | 3 | | 2 | 2 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 4 | 2 | | 5 | 1 | | 5 | 2 | | 5 | 4 | +---------+--------+
and we want to know how many books there are with each tag in the database. We could say
SELECT t.*, COUNT(bt.book_id) FROM tags t JOIN books_tags bt ON t.id = bt.tag_id GROUP BY t.id;
and get
+----+----------+----------------+ | id | name | count(book_id) | +----+----------+----------------+ | 1 | classic | 2 | | 2 | fiction | 5 | | 3 | irish | 2 | | 4 | dystopia | 1 | +----+----------+----------------+ 4 rows in set (0.00 sec)
but we’re missing a tag! This is easily solved by using a LEFT JOIN, like so:
SELECT t.*, COUNT(bt.book_id) FROM tags t LEFT JOIN books_tags bt ON t.id = bt.tag_id GROUP BY t.id;
+----+------------+----------------+ | id | name | count(book_id) | +----+------------+----------------+ | 1 | classic | 2 | | 2 | fiction | 5 | | 3 | irish | 2 | | 4 | dystopia | 1 | | 5 | literature | 0 | +----+------------+----------------+ 5 rows in set (0.00 sec)
But what if you only want to consider books you’ve read? Now we must also join against the books table.
You may be tempted to try
SELECT t.*, COUNT(bt.book_id) FROM tags t LEFT JOIN books_tags bt ON t.id = bt.tag_id LEFT JOIN books b ON b.id = bt.book_id WHERE is_unread = 0 GROUP BY t.id;
but that simply returns
+----+----------+----------------+ | id | name | count(book_id) | +----+----------+----------------+ | 1 | classic | 2 | | 2 | fiction | 2 | | 4 | dystopia | 1 | +----+----------+----------------+
To return all tags the trick is to move the condition into the join clause and then count what’s returned from the third table. Thus
SELECT t.*, COUNT(b.id) FROM tags t LEFT JOIN books_tags bt ON t.id = bt.tag_id LEFT JOIN books b ON b.id = bt.book_id AND b.is_unread = 0 WHERE 1 GROUP BY t.id;
+----+------------+-------------+ | id | name | count(b.id) | +----+------------+-------------+ | 1 | classic | 2 | | 2 | fiction | 2 | | 3 | irish | 0 | | 4 | dystopia | 1 | | 5 | literature | 0 | +----+------------+-------------+
I’ve read two classics, two works of fiction, and one about dystopia. Doesn’t that seem like a lot?
What if we want to go a step further? Let’s add authors:
authors:
+----+---------------------+ | id | name | +----+---------------------+ | 1 | James Joyce | | 2 | F. Scott Fitzgerald | | 3 | Vladimir Nabokov | | 4 | Aldous Huxley | +----+---------------------+
authors_books:
+-----------+---------+ | author_id | book_id | +-----------+---------+ | 1 | 1 | | 2 | 2 | | 1 | 3 | | 3 | 4 | | 4 | 5 | +-----------+---------+
If we want to do a standard inner join against the authors table (we don’t want books without authors, but we still want tags without books) we can do so with:
SELECT t.*, COUNT(b.id) FROM tags t LEFT JOIN books_tags bt ON t.id = bt.tag_id LEFT JOIN (books b, authors_books ab, authors a) ON b.id = bt.book_id AND b.id = ab.book_id AND ab.author_id = a.id AND a.name = ‘F. Scott Fitzgerald’ AND b.is_unread = 0 WHERE 1 GROUP BY t.id;