Friday, April 3, 2009

Re: FYI: poor mysql query planning when ordering by id

I think these order by id clauses everywhere were frequently put in place to get consistent time based ordering of entries in the test suite when several things may be created at nearly the same time. Sucks that mysql isn't optimizing them very well. Regardless, it probably wasn't a very 'correct' choice (eg. if we every move to a sharded data storage scheme). We should probably move towards a more 'correct' clause: "order by created_at, id". Since it indicates what we want (created_at order) but provides a fallback for consistent results in the test suite (falling back to id).

rp



On Fri, Apr 3, 2009 at 12:34 AM, Nathan Stoll <nathan@themechanicalzoo.com> wrote:
Geesh.  That's nice work mysql optimizer!

On Thu, Apr 2, 2009 at 7:48 PM, Fritz Schneider
<fritz@themechanicalzoo.com> wrote:
> Paying attention to slow queries today Bob, Nick, and I saw a number of
> seemingly innocuous-looking sql statements that were taking a ridiculously
> long time to execute. For example:
>
> SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id DESC LIMIT 1;
>
> There's an index on address_id so there's no reason this should take up to 8
> seconds! Explaining itl:
>
> mysql> explain SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id
> DESC LIMIT 1;
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> | id | select_type | table | type  |
> possible_keys
> | key     | key_len | ref  | rows  | Extra       |
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> |  1 | SIMPLE      | msgs  | index |
> index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at
> | PRIMARY | 4       | NULL | 75525 | Using where |
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> 1 row in set (0.00 sec)
>
> It's using the id as the key for this query because of the order by,
> resulting in a 75k row scan. Adding an additional order by (should be
> equivalent) solves the problem:
>
> mysql> explain SELECT * FROM `msgs` WHERE (`msgs`.address_id = 11295) ORDER
> BY id DESC, created_at DESC LIMIT 1;
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
> | id | select_type | table | type |
> possible_keys
> | key                                   | key_len | ref   | rows |
> Extra                       |
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
> |  1 | SIMPLE      | msgs  | ref  |
> index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at
> | index_msgs_on_address_id_and_incoming | 5       | const |    9 | Using
> where; Using filesort |
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
>
> I'm fixing up the two specific places I noticed this (msgs,
> routing_suggestion_requests), but it's something to be on the lookout for
> because we order by id all over the place.
>
> I'll be paying much more attention to slow queries now that we have splunk
> in place.
>
> --
> Love,
> Fritz
>
>

No comments:

Post a Comment