Race track

Nicolas Hoizey

Further Performance Testing Enumerable’s Loveliness

Ok, I’ll stop after this one, but I said that before. Plenty of fun nerd-sniping on this problem.

I was pointed at Enumerable#partition (by Brandon, Michael, Piotr & Kasper) which would avoid two of the four loops in the previously “best” solution.

I was also nudged to benchmark my initial “loops” solution by Dave, because straightforward loops are often extremely well optimised at the language level.

So… here’s all the benchmarks for that.

require 'benchmark/ips'

n = 100
a = Array.new(n) { |i| rand(1_000_000) }
b = Array.new(n) { |i| rand(1_000_000) }

Benchmark.ips do |x|
  # Pre-filter by odd/even, then compute products
  x.report('odd & even') do
    (a.select(&:odd?).product(b.select(&:even?)) +
      b.select(&:odd?).product(a.select(&:even?)))
    .uniq
  end
  # Alternatives: compute all products, then filter by sum
  x.report('full product, +') do
    a.product(b).select { (_1 + _2).odd? }.uniq
  end
  x.report('full product, sum') do
    a.product(b).select { _1.sum.odd? }.uniq
  end
  x.report('loops') do
    results = []
    a.each do |x|
      b.each do |y|
        if ((x + y) % 2) == 1
          results << [x, y]
        end
      end
    end
    results.uniq
  end
  x.report('partition') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    (odd_as.product(even_bs) + odd_bs.product(even_as)).uniq
  end

  x.compare!
end

Here’s the results:

Calculating -------------------------------------
          odd & even      1.347k (± 2.2%) i/s  (742.59 μs/i) -      6.850k in   5.089081s
     full product, +    785.253  (± 1.5%) i/s    (1.27 ms/i) -      3.952k in   5.034013s
   full product, sum    719.042  (± 2.1%) i/s    (1.39 ms/i) -      3.650k in   5.078324s
               loops    893.864  (± 1.3%) i/s    (1.12 ms/i) -      4.488k in   5.021755s
           partition      1.370k (± 1.8%) i/s  (729.79 μs/i) -      6.900k in   5.037264s

Comparison:
           partition:     1370.3 i/s
          odd & even:     1346.6 i/s - same-ish: difference falls within error
               loops:      893.9 i/s - 1.53x  slower
     full product, +:      785.3 i/s - 1.74x  slower
   full product, sum:      719.0 i/s - 1.91x  slower

Further improvements were suggested.

If we use a more apt method for combining arrays, like | or union we may see a performance improvement as we avoid the additional uniq loop through the final array of results.

require 'benchmark/ips'

n = 100
a = Array.new(n) { |i| rand(1_000_000) }
b = Array.new(n) { |i| rand(1_000_000) }

Benchmark.ips do |x|
  x.report('odd & even') do
    (a.select(&:odd?).product(b.select(&:even?)) +
      b.select(&:odd?).product(a.select(&:even?)))
    .uniq
  end
  x.report('odd & even union') do
    a.select(&:odd?).product(b.select(&:even?)).union(
        b.select(&:odd?).product(a.select(&:even?)))
  end
  x.report('odd & even |') do
    a.select(&:odd?).product(b.select(&:even?)) |
      b.select(&:odd?).product(a.select(&:even?))
  end
  x.report('loops') do
    results = []
    a.each do |x|
      b.each do |y|
        if ((x + y) % 2) == 1
          results << [x, y]
        end
      end
    end
    results.uniq
  end
  x.report('loops |') do
    results = []
    a.each do |x|
      b.each do |y|
        if ((x + y) % 2) == 1
          results | [x, y]
        end
      end
    end
    results
  end
  x.report('partition') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    (odd_as.product(even_bs) + odd_bs.product(even_as)).uniq
  end
  x.report('partition union') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    odd_as.product(even_bs).union(odd_bs.product(even_as))
  end
  x.report('partition |') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    odd_as.product(even_bs) | odd_bs.product(even_as)
  end

  x.compare!
end

The results are pretty much a wash for the partition and odd & even cases.

Calculating -------------------------------------
          odd & even      1.375k (± 0.4%) i/s  (727.08 μs/i) -      6.987k in   5.080142s
    odd & even union      1.334k (± 0.4%) i/s  (749.77 μs/i) -      6.700k in   5.023541s
        odd & even |      1.339k (± 0.4%) i/s  (746.81 μs/i) -      6.700k in   5.003743s
               loops    896.130 (± 0.9%) i/s    (1.12 ms/i) -      4.500k in   5.022004s
             loops |      1.416k (± 1.1%) i/s  (706.43 μs/i) -      7.100k in   5.016274s
           partition      1.378k (± 0.8%) i/s  (725.47 μs/i) -      6.900k in   5.006093s
     partition union      1.349k (± 0.6%) i/s  (741.43 μs/i) -      6.834k in   5.067094s
         partition |      1.346k (± 0.4%) i/s  (742.74 μs/i) -      6.834k in   5.075995s

Comparison:
             loops |:     1415.6 i/s
           partition:     1378.4 i/s - 1.03x  slower
          odd & even:     1375.4 i/s - 1.03x  slower
     partition union:     1348.7 i/s - 1.05x  slower
         partition |:     1346.4 i/s - 1.05x  slower
        odd & even |:     1339.0 i/s - 1.06x  slower
    odd & even union:     1333.7 i/s - 1.06x  slower
               loops:      896.1 i/s - 1.58x  slower

However, the performance of the “straightforward” loop solution is still significantly the worst, but when using the| operator in the depths of the loop, it leaps to becoming marginally better performing than all the other implementations.

One last thing. Ruby has a lazy evaluation feature which can be used to avoid overheads of creating intermediate arrays in large datasets, but it’s only available on certain methods on Enumerable.

One of those rewritten methods is uniq, does making the uniqueness loop a “lazy” enumerator affect performance? You can also use lazy with each in the “looping” example.

How do these new additions to the benchmark fare?

require 'benchmark/ips'

n = 100
a = Array.new(n) { |i| rand(1_000_000) }
b = Array.new(n) { |i| rand(1_000_000) }

Benchmark.ips do |x|
  x.report('odd & even') do
    (a.select(&:odd?).product(b.select(&:even?)) +
      b.select(&:odd?).product(a.select(&:even?)))
    .uniq
  end
  x.report('odd & even lazy') do
    (a.select(&:odd?).product(b.select(&:even?)) +
      b.select(&:odd?).product(a.select(&:even?)))
    .lazy.uniq
  end
  x.report('loops') do
    results = []
    a.each do |x|
      b.each do |y|
        if ((x + y) % 2) == 1
          results << [x, y]
        end
      end
    end
    results.uniq
  end
  x.report('loops |') do
    results = []
    a.each do |x|
      b.each do |y|
        if ((x + y) % 2) == 1
          results | [x, y]
        end
      end
    end
    results
  end
  x.report('loops | lazy') do
    results = []
    a.lazy.each do |x|
      b.lazy.each do |y|
        if ((x + y) % 2) == 1
          results | [x, y]
        end
      end
    end
    results
  end
  x.report('partition') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    (odd_as.product(even_bs) + odd_bs.product(even_as)).uniq
  end
  x.report('partition lazy') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    (odd_as.product(even_bs) + odd_bs.product(even_as)).lazy.uniq
  end
  x.report('partition |') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    odd_as.product(even_bs) | odd_bs.product(even_as)
  end

  x.compare!
end

It looks like the lazy version of uniq is a lot faster than the non-lazy versions, or the improved “loops with |” example, according to this test.

Calculating -------------------------------------
          odd & even      1.351k (± 2.5%) i/s  (740.37 μs/i) -      6.834k in   5.063314s
     odd & even lazy      6.145k (± 1.1%) i/s  (162.74 μs/i) -     31.263k in   5.088472s
               loops    917.432 (± 0.7%) i/s    (1.09 ms/i) -      4.641k in   5.058882s
             loops |      1.481k (± 2.7%) i/s  (675.29 μs/i) -      7.446k in   5.032326s
        loops | lazy      1.466k (± 0.3%) i/s  (682.27 μs/i) -      7.436k in   5.073396s
           partition      1.373k (± 0.5%) i/s  (728.15 μs/i) -      6.936k in   5.050569s
      partition lazy      5.224k (± 1.0%) i/s  (191.41 μs/i) -     26.180k in   5.011751s
         partition |      1.339k (± 0.7%) i/s  (746.74 μs/i) -      6.732k in   5.027289s

Comparison:
     odd & even lazy:     6144.7 i/s
      partition lazy:     5224.3 i/s - 1.18x  slower
             loops |:     1480.9 i/s - 4.15x  slower
        loops | lazy:     1465.7 i/s - 4.19x  slower
           partition:     1373.3 i/s - 4.47x  slower
          odd & even:     1350.7 i/s - 4.55x  slower
         partition |:     1339.2 i/s - 4.59x  slower
               loops:      917.4 i/s - 6.70x  slower

Is there some magic behind the scenes that makes the lazy version of uniq faster than the non-lazy version? Sadly, no. The lack of a performance difference between the lazy and non-lazy “loop” versions should be the clue here.

uniq processes the entire array immediately and returns a new array with duplicates removed. It has to examine every element to determine uniqueness.

lazy.uniq returns a Enumerator::Lazy that doesn’t do any work until you start consuming elements from it! In order for this to be a fair comparison the code would have to be:

  # e.g.
  x.report('partition lazy') do
    odd_as, even_as = a.partition(&:odd?)
    odd_bs, even_bs = b.partition(&:odd?)
    (odd_as.product(even_bs) + odd_bs.product(even_as)).lazy.uniq.to_a
  end

The results bear this out:

Calculating -------------------------------------
             loops |      1.436k (± 2.6%) i/s  (696.42 μs/i) -      7.191k in   5.011518s
        loops | lazy      1.424k (± 1.4%) i/s  (702.40 μs/i) -      7.228k in   5.078029s
           partition      1.359k (± 1.7%) i/s  (735.93 μs/i) -      6.885k in   5.068374s
      partition lazy    814.193  (± 2.9%) i/s    (1.23 ms/i) -      4.116k in   5.060084s

Comparison:
             loops |:     1435.9 i/s
        loops | lazy:     1423.7 i/s - same-ish: difference falls within error
           partition:     1358.8 i/s - 1.06x  slower
      partition lazy:      814.2 i/s - 1.76x  slower

What have we learned from this wild goose chase of nerd-sniping?

  1. It’s ok to optimise for readability.
  2. It’s important to accurately benchmark your code when you’re trying to optimize.
  3. Ruby is delightful.
  4. I sometimes can’t let go.

Last updated on May 29th, 2025