Thursday, April 23, 2009
nice assert method from andy @ c5
--
Love,
Fritz
setting os x terminal window title
Helpful if like me you always have many terminals open.
--
Love,
Fritz
Friday, April 10, 2009
Using ActiveRecord :include to perform join filtering is expensive
self.connections.find(:all, :conditions => ["connections.type IN (?) AND users.active=1", Connection::FRIEND_CONNECTION_TYPES], :order => "users.id ASC", :include => :to).collect(&:to_id).uniq
( a work of art in a single line of Ruby code ;-))
My refactored version using raw SQL looks like this:
quoted_types = Connection::FRIEND_CONNECTION_TYPES.map{|type| "'#{type}'"}
sql = "SELECT DISTINCT(c.to_id) " +
"FROM connections c LEFT JOIN users u ON (c.to_id=u.id) " +
"WHERE c.from_id=#{self.id.to_i} AND c.type IN (#{quoted_types.join(',')}) AND u.active=1 " +
"AND c.to_id NOT IN (SELECT blockee_id FROM blockings WHERE blocker_id=#{self.id.to_i}) AND c.to_id!=#{self.id.to_i}"
Connection.connection.select_values(sql).map(&:to_i)
You'll notice that the SQL version also handled blockings natively now.
Now for the benchmarking using Max's user account with 406 friends:
For the ActiveRecord version:
>> Benchmark.bm(1) { |x| x.report{u.ar_first_degree_friend_ids} }
user system total real
0.400000 0.120000 0.520000 ( 0.615262)
For the SQL version:
>> Benchmark.bm(1) { |x| x.report{max.first_degree_friend_ids} }
user system total real
0.000000 0.000000 0.000000 ( 0.019606)
The performance difference is pretty obvious. This change is not only more correct, but will be 30x faster. We should see a significant performance improvement in cache priming and lazy cache loading as a result.
Thursday, April 9, 2009
Wednesday, April 8, 2009
Log rotation best practices
http://sial.org/howto/logging/rotate/
Their solution suggestion is pretty clear; just switch to logging based on time-based buckets in the first place, so you don't need to do conveyor belt rotation and can always find where the right logs are.
"given the choice of a logrotate scheme, I would first recommend time based buckets, second a startup time scheme, and never hindsight conveyer belt rotation."
Rackspace's approach to logs and analysis over them
See:
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
http://blog.racklabs.com/?p=66
Tuesday, April 7, 2009
wrt magic labels in fixtures
# conversations.yml
woodchuck_capabilities:
id: 1
...
# channels.yml
woodchuck_channel:
conversation: :woodchuck_capabilities # <---- NOPE
conversation_id: 1 # <---- YEP
I knew that the magic couldn't span databases, but didn't know that specifying an id disabled that behavior...
--
Love,
Fritz
Friday, April 3, 2009
useful: schema dump in code
ActiveRecord::SchemaDumper.dump(Operation.connection)
--
Love,
Fritz
Re: FYI: poor mysql query planning when ordering by id
rp
Geesh. That's nice work mysql optimizer!
On Thu, Apr 2, 2009 at 7:48 PM, Fritz Schneider
<fritz@themechanicalzoo.com> wrote:
> Paying attention to slow queries today Bob, Nick, and I saw a number of
> seemingly innocuous-looking sql statements that were taking a ridiculously
> long time to execute. For example:
>
> SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id DESC LIMIT 1;
>
> There's an index on address_id so there's no reason this should take up to 8
> seconds! Explaining itl:
>
> mysql> explain SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id
> DESC LIMIT 1;
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> | id | select_type | table | type |
> possible_keys
> | key | key_len | ref | rows | Extra |
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> | 1 | SIMPLE | msgs | index |
> index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at
> | PRIMARY | 4 | NULL | 75525 | Using where |
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> 1 row in set (0.00 sec)
>
> It's using the id as the key for this query because of the order by,
> resulting in a 75k row scan. Adding an additional order by (should be
> equivalent) solves the problem:
>
> mysql> explain SELECT * FROM `msgs` WHERE (`msgs`.address_id = 11295) ORDER
> BY id DESC, created_at DESC LIMIT 1;
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
> | id | select_type | table | type |
> possible_keys
> | key | key_len | ref | rows |
> Extra |
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
> | 1 | SIMPLE | msgs | ref |
> index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at
> | index_msgs_on_address_id_and_incoming | 5 | const | 9 | Using
> where; Using filesort |
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
>
> I'm fixing up the two specific places I noticed this (msgs,
> routing_suggestion_requests), but it's something to be on the lookout for
> because we order by id all over the place.
>
> I'll be paying much more attention to slow queries now that we have splunk
> in place.
>
> --
> Love,
> Fritz
>
>
Re: FYI: poor mysql query planning when ordering by id
On Thu, Apr 2, 2009 at 7:48 PM, Fritz Schneider
<fritz@themechanicalzoo.com> wrote:
> Paying attention to slow queries today Bob, Nick, and I saw a number of
> seemingly innocuous-looking sql statements that were taking a ridiculously
> long time to execute. For example:
>
> SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id DESC LIMIT 1;
>
> There's an index on address_id so there's no reason this should take up to 8
> seconds! Explaining itl:
>
> mysql> explain SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id
> DESC LIMIT 1;
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> | id | select_type | table | type |
> possible_keys
> | key | key_len | ref | rows | Extra |
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> | 1 | SIMPLE | msgs | index |
> index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at
> | PRIMARY | 4 | NULL | 75525 | Using where |
> +----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
> 1 row in set (0.00 sec)
>
> It's using the id as the key for this query because of the order by,
> resulting in a 75k row scan. Adding an additional order by (should be
> equivalent) solves the problem:
>
> mysql> explain SELECT * FROM `msgs` WHERE (`msgs`.address_id = 11295) ORDER
> BY id DESC, created_at DESC LIMIT 1;
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
> | id | select_type | table | type |
> possible_keys
> | key | key_len | ref | rows |
> Extra |
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
> | 1 | SIMPLE | msgs | ref |
> index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at
> | index_msgs_on_address_id_and_incoming | 5 | const | 9 | Using
> where; Using filesort |
> +----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
>
> I'm fixing up the two specific places I noticed this (msgs,
> routing_suggestion_requests), but it's something to be on the lookout for
> because we order by id all over the place.
>
> I'll be paying much more attention to slow queries now that we have splunk
> in place.
>
> --
> Love,
> Fritz
>
>
Thursday, April 2, 2009
FYI: poor mysql query planning when ordering by id
SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id DESC LIMIT 1;
There's an index on address_id so there's no reason this should take up to 8 seconds! Explaining itl:
mysql> explain SELECT * FROM `msgs` WHERE address_id = 11295 ORDER BY id DESC LIMIT 1;
+----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
| 1 | SIMPLE | msgs | index | index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at | PRIMARY | 4 | NULL | 75525 | Using where |
+----+-------------+-------+-------+--------------------------------------------------------------------------------------------------+---------+---------+------+-------+-------------+
1 row in set (0.00 sec)
It's using the id as the key for this query because of the order by, resulting in a 75k row scan. Adding an additional order by (should be equivalent) solves the problem:
mysql> explain SELECT * FROM `msgs` WHERE (`msgs`.address_id = 11295) ORDER BY id DESC, created_at DESC LIMIT 1;
+----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
| 1 | SIMPLE | msgs | ref | index_msgs_on_address_id_and_incoming,index_msgs_on_address_id_and_recognized_cmd_and_created_at | index_msgs_on_address_id_and_incoming | 5 | const | 9 | Using where; Using filesort |
+----+-------------+-------+------+--------------------------------------------------------------------------------------------------+---------------------------------------+---------+-------+------+-----------------------------+
I'm fixing up the two specific places I noticed this (msgs, routing_suggestion_requests), but it's something to be on the lookout for because we order by id all over the place.
I'll be paying much more attention to slow queries now that we have splunk in place.
--
Love,
Fritz
"Migrating from svn to DVCS"
Migrating from svn to a distributed VCS
Wednesday, April 1, 2009
memoize & Aws::S3
--
Love,
Fritz