Wednesday, April 8, 2009

Rackspace's approach to logs and analysis over them

Rackspace gradually improve their inhouse logging through a few phases, eventually ended up forwarding logs to a distributed filesystem that ran hadoop on top of it, and could distribute an analysis mapreduce task against the logs to answer pretty much anything in a deterministic runtime.

See:

http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

http://blog.racklabs.com/?p=66

No comments:

Post a Comment