Know what your scopes are doing: where.first vs find_by

There are numerous benefits to using an Object-Relational Mapper (ORM), such as Active Record, to generate your SQL. You gain from clear repeatable abstractions, saving time and improving readability.

Unfortunately the very abstractions that make an ORM useful can also have unintended consequences.

It is very easy to introduce performance problems because the actual SQL that runs on your database server is generated by the framework, sometimes this means the SQL isn’t as efficient as it could be.

We recently came across an issue at CoverageBook that highlighted this very point.

Instead of…

…using a where condition then first.

User.where(email: "andy@goodscary.com").first

Use…

find_by.

User.find_by(email: "andy@goodscary.com")
User.find_by_email("andy@goodscary.com")

Why?

This is one of those cases where the ORM (and the tooling around it) get in the way and introduce unforseen performance issues.

The .where scope has an implicit ORDER scope on the primary key that isn’t obvious at first glance.

User.where(email: "andy@goodscary.com")
# SELECT "users".*
# FROM "users"
# WHERE "users"."email" = "andy@goodscary.com"

User.where(email: "andy@goodscary.com").first
# SELECT "users".*
# FROM "users"
# WHERE "users"."email" = "andy@goodscary.com"
# ORDER BY "users"."id" ASC
# LIMIT 1

User.find_by(email: "andy@goodscary.com")
# SELECT "users".*
# FROM "users"
# WHERE "users"."email" = "andy@goodscary.com"
# LIMIT 1

Straightforward indexes on our database didn’t help us as in our—more complex—case. We were querying using an index, but because we were using .where().first we were inadvertently doing a non-indexed scan to establish the order, which caused enormous performance problems.

Additionally, we were writing many thousands of rows per second and, even with a monstrously powerful database, we were seeing issues because the entire table was being sorted to then pick only one record.

Debugging this issue was tricky because it is not possible to call .to_sql on the results of .find_by or .where().first as the query executes and you have to use logging to work out the exact SQL that is being generated.

Knowing the exact SQL Active Record is generating from methods that might seem the same on the surface can be very important.

Why not?

In small tables, under light load, the performance impact of using where().first would be negligible.

Last updated on March 14th, 2022