image by Priscilla Du Preez
Try not to loop through associations inside jobs
One of the downsides of using Ruby to interact with your database, rather than SQL directly, is that it’s easy to accidentally give yourself performance headaches via running multiple very similar queries. Although the advantages of using Active Record (or something similar) are generally worth it!
This is typically referred to as an N+1 problem. They occur most often seen when you display an Active Record object in a view and show all of the associated records that
belong_to to it. When lots of records do need to be loaded or saved to the database you can putting this heavier database work in asynchronous jobs.
However, it’s too easy to underestimate how different production data is from your local machine and therefore it‘s easy to end up with long running loops inside those jobs.
…looping through lots of Active Record objects and updating them, even in a job:
class UpdateManyCommentsJob < ApplicationJob def perform(post) post.comments.each do |comment| comment.update(name: comment.name.downcase) end end end
…a separate job for each update:
class UpdateManyCommentsJob < ApplicationJob def perform(post) post.comments.each do |comment| UpdateSingleCommentJob.perform_later(comment) end end end class UpdateSingleCommentJob < ApplicationJob def perform(comment) comment.update(name: comment.name.downcase) end end
This is about parallelizing your work as much as possible and keeping each element of the work small.
Mike Perham, the author of sidekiq—who therefore knows a thing or two about jobs—suggests that you “design your jobs so you can run lots of them in parallel”. In the proposed solution you end up with a single parent job enqueuing lots of sub-jobs rather than one long-running job making lots of database calls. This results in multiple benefits.
Providing you are running multiple workers on your background queue the total time taken to do the updates will be greatly reduced as the individual updates can be done in parallel, rather than serially in the loop.
In the case where a post has a lot of comments, you also avoid a single long-running job that is hard to retry and may cause memory issues.
Consider loops within jobs a potential source of bugs in production and be wary.
In most cases running a small loop of updates on a
has_many association in a job is also fine. Until it isn’t, in production. And this is the point.
Depending on the job you might be able to use other Rails features. You could use
update_all to do all the updates for an association in one SQL query, but the update will likely have to be simple and no callbacks will run on the associated records.
I was reminded by Dave that it might be tricky to use this technique if you’re using counter caches. Or if your updates need to be transactional.
This technique is also likely to lead to a higher workload for the database as your jobs run lots of similar updates in parallel against the same table.
Last updated on February 13th, 2023 by @andycroll
An email newsletter, with one Ruby/Rails technique delivered with a ‘why?’ and a ‘how?’ every two weeks. It’s deliberately brief, focussed & opinionated.