Wednesday, May 27, 2009

S3 and retries

As an FYI, reading from S3 isn't necessarily reliable. In the past couple of days, I've seen a lot of ETIMEDOUT errors coming back from the connection. The first step was to add 10x retry, which took a while, but failed again. Then, I added 10x retry with 1 second sleeps between retries, which, again, worked for a while, then failed. Finally, I wound up at 25x retries with 1s sleeps; this seems to work. (Silly me, I thought s3sync's 100x retry policy was just paranoia or overly cautious...)

Except that now I'm getting occasional 'getaddrinfo: nodename nor servname provided, or not known (SocketError)'.

So, if you're reading in data stored in S3 and want it to be reliable, you should wrap it with a generic exception handler and a lot of backing-off retries. With luck, it'll work out for you. Without luck, well, what sort of reliability do you expect from a "cloud"?


max_attempts = 25
attempts = 0

begin
my_stuff = s3object.load
rescue Exception => e
attempts += 1
sleep 1
retry if attempts < max_attempts
end

No comments:

Post a Comment