I can’t remember the last time I stumbled across a bug in MySQL, however this morning I did.
Querying the following table to locate entities with more than one title attribute (this is a diagnostic query, not used in the application).
CREATE TABLE `entity_attr` ( `entity_id` int(11) NOT NULL default '0', `entity_attr_id` int(11) NOT NULL default '0', `seq_no` int(11) NOT NULL default '0', `lang_id` int(11) NOT NULL default '0', `intval` int(11) default NULL, `strval` text, `created` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, `import_source_id` int(11) NOT NULL default '0', PRIMARY KEY (`entity_id`,`entity_attr_id`,`seq_no`,`lang_id`), KEY `entity_attr_id` (`entity_attr_id`,`strval`(10)), KEY `entity_attr_id_2` (`entity_attr_id`,`intval`), KEY `k_entity_id` (`entity_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8
The query:
> SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 LIMIT 10; +-----------+-----+ | entity_id | num | +-----------+-----+ | 228896 | 2 | | 246726 | 2 | | 409089 | 2 | | 409091 | 2 | | 409098 | 2 | | 409104 | 2 | | 409105 | 2 | | 409106 | 2 | | 409107 | 2 | | 409108 | 2 | +-----------+-----+ 10 rows in set (2.65 sec)
OK, so there are occurrences of entities with multiple title attributes, what’s the most amount of titles a single entity has?
> SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 ORDER BY num DESC LIMIT 10; +-----------+--------+ | entity_id | num | +-----------+--------+ | 409109 | 525193 | +-----------+--------+ 1 row in set (0.93 sec)
Only one result? Wow, 525k titles! Really?
> SELECT COUNT(*) FROM entity_attr WHERE entity_id = 409109 AND entity_attr_id = 10; +----------+ | COUNT(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec)
No.
Lets take another look, remove the limit.
> SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 ORDER BY num DESC; +-----------+-----+ | entity_id | num | +-----------+-----+ | 409104 | 2 | ... +-----------+-----+ 107 rows in set (0.00 sec)
And if we bring it back.
> SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 ORDER BY num DESC LIMIT 107; +-----------+--------+ | entity_id | num | +-----------+--------+ | 409109 | 525193 | +-----------+--------+ 1 row in set (0.83 sec)
Looks like LIMIT
is causing the issue here. Looks like 5 is a magic LIMIT
number too, changing the entity ID in the result:
Limit 4:
+-----------+--------+ | entity_id | num | +-----------+--------+ | 409109 | 525193 | +-----------+--------+ 1 row in set (0.83 sec)
Limit 5:
+-----------+--------+ | entity_id | num | +-----------+--------+ | 55446 | 525193 | +-----------+--------+ 1 row in set (0.00 sec)
Limit 6:
+-----------+--------+ | entity_id | num | +-----------+--------+ | 409109 | 525193 | +-----------+--------+ 1 row in set (0.82 sec)
So how does the query plan change between these statements?
> EXPLAIN SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 LIMIT 10; +----+-------------+-------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+ | 1 | SIMPLE | entity_attr | index | entity_attr_id,entity_attr_id_2 | PRIMARY | 16 | NULL | 158569 | Using where; Using index | +----+-------------+-------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
Using the magic limit of 5:
> EXPLAIN SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 LIMIT 5; +----+-------------+-------------+-------+---------------------------------+---------+---------+------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+---------------------------------+---------+---------+------+-------+--------------------------+ | 1 | SIMPLE | entity_attr | index | entity_attr_id,entity_attr_id_2 | PRIMARY | 16 | NULL | 79221 | Using where; Using index | +----+-------------+-------------+-------+---------------------------------+---------+---------+------+-------+--------------------------+
Removing the limit (where we get correct results):
> EXPLAIN SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1; +----+-------------+-------------+------+---------------------------------+------------------+---------+-------+--------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+---------------------------------+------------------+---------+-------+--------+----------------------------------------------+ | 1 | SIMPLE | entity_attr | ref | entity_attr_id,entity_attr_id_2 | entity_attr_id_2 | 4 | const | 453654 | Using where; Using temporary; Using filesort | +----+-------------+-------------+------+---------------------------------+------------------+---------+-------+--------+----------------------------------------------+
Could there be an issue using the primary key? Lets rebuild the indexes on this table.
> REPAIR TABLE entity_attr QUICK; +------------------+--------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------------+--------+----------+----------+ | muco.entity_attr | repair | status | OK | +------------------+--------+----------+----------+
Re-run our query:
> SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 ORDER BY num DESC LIMIT 5; +-----------+--------+ | entity_id | num | +-----------+--------+ | 409101 | 525193 | +-----------+--------+
No joy. Lets simplify slightly:
> SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 LIMIT 5; +-----------+-----+ | entity_id | num | +-----------+-----+ | 228896 | 2 | | 246726 | 2 | | 409089 | 2 | | 409091 | 2 | | 409098 | 2 | +-----------+-----+
Looks like we’re good if we strip the ORDER BY clause. Are we still relying soley on an index to find rows?
> EXPLAIN SELECT entity_id, COUNT(*) AS num FROM entity_attr WHERE entity_attr_id = 10 GROUP BY entity_id HAVING num > 1 LIMIT 10; +----+-------------+-------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+ | 1 | SIMPLE | entity_attr | index | entity_attr_id,entity_attr_id_2 | PRIMARY | 16 | NULL | 158569 | Using where; Using index | +----+-------------+-------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
Yup.
So a mix of GROUP
, HAVING
, ORDER BY
with LIMIT
doesn’t play well on this version of MySQL.
This is bug wasn’t present on our production servers (running 5.0.x). This issue is exhibited from:
mysql Ver 14.14 Distrib 5.1.37, for debian-linux-gnu (i486) using EditLine wrapper Server version: 5.1.37-1ubuntu5.1-log (Ubuntu)
I’ve not tested this on later releases of the 5.1.x branch. I need a machine I can dirty with two MySQL installs and (on Ubuntu) need to compile from source for the latest (at the time of writing) 5.1.46 release.
UPDATE: I installed Ubuntu 10.04 tonight (April 30th) on a VM and tested this query, the problem persists in Lucid Lynx (version 5.1.41-3ubuntu12
).
comments
No comments for this post.