Roller coaster loops

image by Priscilla Du Preez

Try not to loop through associations inside jobs

One of the downsides of using Ruby to interact with your database, rather than SQL directly, is that it’s easy to accidentally give yourself performance headaches via running multiple very similar queries. Although the advantages of using Active Record (or something similar) are generally worth it!

RailsConf 2024

I'm co-chairing RailsConf 2024 in Detroit May 7–9. Come and join us 

This is typically referred to as an N+1 problem. They occur most often seen when you display an Active Record object in a view and show all of the associated records that belong_to to it. When lots of records do need to be loaded or saved to the database you can putting this heavier database work in asynchronous jobs.

However, it’s too easy to underestimate how different production data is from your local machine and therefore it‘s easy to end up with long running loops inside those jobs.

Instead of…

…looping through lots of Active Record objects and updating them, even in a job:

class UpdateManyCommentsJob < ApplicationJob
  def perform(post)
    post.comments.each do |comment|
      comment.update(name: comment.name.downcase)
    end
  end
end

Use…

…a separate job for each update:

class UpdateManyCommentsJob < ApplicationJob
  def perform(post)
    post.comments.each do |comment|
      UpdateSingleCommentJob.perform_later(comment)
    end
  end
end

class UpdateSingleCommentJob < ApplicationJob
  def perform(comment)
    comment.update(name: comment.name.downcase)
  end
end

Why?

This is about parallelizing your work as much as possible and keeping each element of the work small.

Mike Perham, the author of sidekiq—who therefore knows a thing or two about jobs—suggests that you “design your jobs so you can run lots of them in parallel”. In the proposed solution you end up with a single parent job enqueuing lots of sub-jobs rather than one long-running job making lots of database calls. This results in multiple benefits.

Providing you are running multiple workers on your background queue the total time taken to do the updates will be greatly reduced as the individual updates can be done in parallel, rather than serially in the loop.

In the case where a post has a lot of comments, you also avoid a single long-running job that is hard to retry and may cause memory issues.

Consider loops within jobs a potential source of bugs in production and be wary.

Why not?

In most cases running a small loop of updates on a has_many association in a job is also fine. Until it isn’t, in production. And this is the point.

Depending on the job you might be able to use other Rails features. You could use update_all to do all the updates for an association in one SQL query, but the update will likely have to be simple and no callbacks will run on the associated records.

I was reminded by Dave that it might be tricky to use this technique if you’re using counter caches. Or if your updates need to be transactional.

This technique is also likely to lead to a higher workload for the database as your jobs run lots of similar updates in parallel against the same table.

Brighton Ruby 2024

Still running UK’s friendliest, Ruby event on Friday 28th June. Ice cream + Ruby 


Last updated on February 13th, 2023 by @andycroll

An email newsletter, with one Ruby/Rails technique delivered with a ‘why?’ and a ‘how?’ every two weeks. It’s deliberately brief, focussed & opinionated.