Friday, April 10, 2009

Using ActiveRecord :include to perform join filtering is expensive

For the User#first_degree_friend_ids lookup, the AR version looks like this:

self.connections.find(:all, :conditions => ["connections.type IN (?) AND users.active=1", Connection::FRIEND_CONNECTION_TYPES], :order => "users.id ASC", :include => :to).collect(&:to_id).uniq
( a work of art in a single line of Ruby code ;-))

My refactored version using raw SQL looks like this:
quoted_types = Connection::FRIEND_CONNECTION_TYPES.map{|type| "'#{type}'"}
sql = "SELECT DISTINCT(c.to_id) " +
"FROM connections c LEFT JOIN users u ON (c.to_id=u.id) " +
"WHERE c.from_id=#{self.id.to_i} AND c.type IN (#{quoted_types.join(',')}) AND u.active=1 " +
"AND c.to_id NOT IN (SELECT blockee_id FROM blockings WHERE blocker_id=#{self.id.to_i}) AND c.to_id!=#{self.id.to_i}"
Connection.connection.select_values(sql).map(&:to_i)

You'll notice that the SQL version also handled blockings natively now.

Now for the benchmarking using Max's user account with 406 friends:

For the ActiveRecord version:
>> Benchmark.bm(1) { |x| x.report{u.ar_first_degree_friend_ids} }
user system total real
0.400000 0.120000 0.520000 ( 0.615262)

For the SQL version:
>> Benchmark.bm(1) { |x| x.report{max.first_degree_friend_ids} }
user system total real
0.000000 0.000000 0.000000 ( 0.019606)

The performance difference is pretty obvious. This change is not only more correct, but will be 30x faster. We should see a significant performance improvement in cache priming and lazy cache loading as a result.

No comments:

Post a Comment