Fun With MySql
1 Comment | Latest by: Phil | Add Your Comment! »»
Fun with MySql! Here's the question: If you're joining two tables with a similar column, should you have an index on the first table in the query, the second table, or both? Which is faster?
I'm not a MySql DBA, but I'll take a crack at answering, and show my work. Up front, I'll say I think it's better having an index on both.
mysql> create table table_1a select fname from another_table where fname is not null limit 0,1000; Query OK, 1000 rows affected (0.08 sec) Records: 1000 Duplicates: 0 Warnings: 0
Create 4 sets of a and b tables.
mysql> create table table_1b select fname from table_1a; Query OK, 1000 rows affected (0.10 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> create table table_2a select fname from table_1a; Query OK, 1000 rows affected (0.12 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> create table table_2b select fname from table_1a; Query OK, 1000 rows affected (0.27 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> create table table_3b select fname from table_1a; Query OK, 1000 rows affected (0.11 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> create table table_3a select fname from table_1a; Query OK, 1000 rows affected (0.06 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> create table table_4a select fname from table_1a; Query OK, 1000 rows affected (0.13 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> create table table_4b select fname from table_1a; Query OK, 1000 rows affected (0.13 sec) Records: 1000 Duplicates: 0 Warnings: 0
Now you have four pairs of table a and b.
Leave indexes off of the first pair. On the 2nd and 3rd pair, add an index to a or b. On the 4th pair, add an index to both tables.
mysql> alter table table_2a add index (fname); Query OK, 1000 rows affected (0.02 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> alter table table_3b add index (fname); Query OK, 1000 rows affected (0.05 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> alter table table_4a add index (fname); Query OK, 1000 rows affected (0.06 sec) Records: 1000 Duplicates: 0 Warnings: 0 mysql> alter table table_4b add index (fname); Query OK, 1000 rows affected (0.11 sec) Records: 1000 Duplicates: 0 Warnings: 0
Now, the test, which is better? We explain a join between the two tables in each of the 4 pairs and compare.
mysql> explain select * from table_1a as a inner join table_1b using (fname); +----+-------------+----------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 1000 | | | 1 | SIMPLE | table_1b | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | +----+-------------+----------+------+---------------+------+---------+------+------+-------------+ 2 rows in set (0.00 sec) mysql> explain select * from table_2a as a inner join table_2b using (fname); +----+-------------+----------+------+---------------+-------+---------+-----------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+-------+---------+-----------------------+------+--------------------------+ | 1 | SIMPLE | table_2b | ALL | NULL | NULL | NULL | NULL | 1000 | | | 1 | SIMPLE | a | ref | fname | fname | 43 | exlist.table_2b.fname | 2 | Using where; Using index | +----+-------------+----------+------+---------------+-------+---------+-----------------------+------+--------------------------+ 2 rows in set (0.00 sec) mysql> explain select * from table_3a as a inner join table_3b using (fname); +----+-------------+----------+------+---------------+-------+---------+----------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+-------+---------+----------------+------+--------------------------+ | 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 1000 | | | 1 | SIMPLE | table_3b | ref | fname | fname | 43 | exlist.a.fname | 2 | Using where; Using index | +----+-------------+----------+------+---------------+-------+---------+----------------+------+--------------------------+ 2 rows in set (0.00 sec) mysql> explain select * from table_4a as a inner join table_4b using (fname); +----+-------------+----------+-------+---------------+-------+---------+----------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+-------+---------+----------------+------+--------------------------+ | 1 | SIMPLE | a | index | fname | fname | 43 | NULL | 1000 | Using index | | 1 | SIMPLE | table_4b | ref | fname | fname | 43 | exlist.a.fname | 2 | Using where; Using index | +----+-------------+----------+-------+---------------+-------+---------+----------------+------+--------------------------+ 2 rows in set (0.01 sec)
In all cases but the first one, the row count is pretty much the same (you'll have to scroll over to see it). The 1st example is the obvious loser with all rows in both tables being scanned. The 2nd and 3rd queries, with an index on 1 of the tables, are no different. And finally in the last case, the row count is the same, but the index gets used on both tables.
Let's actually run the queries and see how we do:
mysql> select * from table_1a as a inner join table_1b using (fname); ... 3502 rows in set (0.25 sec)
3502 rows?! Probably should have made the names unique! Whoops!
mysql> select * from table_2a as a inner join table_2b using (fname); ... 3502 rows in set (0.03 sec) mysql> select * from table_3a as a inner join table_3b using (fname); ... 3502 rows in set (0.03 sec) mysql> select * from table_4a as a inner join table_4b using (fname); ... 3502 rows in set (0.03 sec)
The explain was pretty much right on. The row counts in the last 3 examples were all the same, and the queries took the same amount of time. The extra index in the 4th example didn't help. Not having any indexes in the 1st set caused the query to take over 6 times as long.
It would be interesting to try again with:
- Unique names in the tables (with some overlap, however, so there's something to join on)
- A larger set of data
What have you learned, Dorothy? Thoughts? Bad test? MySql DBA's out there, please help me think about this problem.
Trackback: http://philsown.org/2007/08/fun-with-mysql/trackback
Comments
How to Put Your Face Next to Your Comment
Aug 8, 2007
For the HTML nerd, the span tags in this example were a way of getting around mod_security. Aren't you glad you asked?