image by Ray Reyes
Calculate a mean average from a Ruby array
Ruby doesn’t provide a native method to generate an average (mean) value from an array of integers. Its built-in Math
library focuses on more complex calculations and there’s no built-in #average
or #mean
method on Array
.
This leaves us free to create our own implementation, introducing the opportunity to shoot ourselves in the foot.
Ensure you use…
…Array#sum
when calculating the mean average from an array of integers:
a = [1, 2, 3, 4, 5, 6, 7, 8]
a.sum(0.0) / a.size
#=> 4.5
Why?
Using the #sum
method from Array
is many, many times faster than using the alternative, inject
.
The #sum
method was added to Array
in Ruby 2.4, which is why you might see alternative implementations in other places on the Internet.
In order to compare the performance of the different implementations we can use the benchmark-ips
gem.
require "benchmark/ips"
# Generate a 1,000 element array
a = Array.new(1000) { |_| rand(1000) }
Benchmark.ips do |x|
x.report("sum(0.0) / size") do
a.sum(0.0) / a.size
end
x.report('inject(0.0) / size') do
a.inject(0.0) { |result, i| result + i } / a.size
end
x.compare!
end
The results (on my laptop) show a 50× speed improvement when using the native #sum
. This method was built into the language to provide exactly this sort of performance improvement.
Calculating -------------------------------------
sum(0.0) / size 680.425k (± 6.5%) i/s - 3.432M in 5.06s
inject(0.0) / size 13.513k (± 5.0%) i/s - 67.586k in 5.01s
Comparison:
sum(0.0) / size: 680425.2 i/s
inject(0.0) / size: 13512.7 i/s - 50.35x slower
Although there are variations of using #sum
and then #size
that perform similarly, the biggest performance win is choosing to use the native #sum
method in the first place.
The benchmarking code for a wider variety of implementations:
require "benchmark/ips"
# Generate a 1,000 element array
a = Array.new(1000) { |_| rand(1000) }
Benchmark.ips do |x|
x.report("sum(0.0) / size") do
a.sum(0.0) / a.size
end
x.report("sum.to_f / size") do
a.sum.to_f / a.size
end
x.report("sum / size.to_f") do
a.sum / a.size.to_f
end
x.report("sum.fdiv(size)") do
a.sum.fdiv(a.size)
end
x.report('inject(0.0, :+) / size') do
a.inject(0.0, :+) / a.size
end
x.report('inject(0.0) / size') do
a.inject(0.0) { |result, i| result + i } / a.size
end
x.report('inject(0).to_f / size') do
a.inject(0) { |result, i| result + i }.to_f / a.size
end
x.report('inject(0) / size.to_f') do
a.inject(0) { |result, i| result + i } / a.size.to_f
end
x.report('inject(0).fdiv(size)') do
a.inject(0) { |result, i| result + i }.fdiv(a.size)
end
x.compare!
end
…and the comparison results:
Comparison:
sum(0.0) / size: 668222.4 i/s
sum / size.to_f: 660291.4 i/s - same-ish
sum.to_f / size: 655929.1 i/s - same-ish
sum.fdiv(size): 621960.0 i/s - same-ish
inject(0.0, :+) / size: 30823.6 i/s - 21.68x slower
inject(0) / size.to_f: 18740.7 i/s - 35.66x slower
inject(0).to_f / size: 18320.1 i/s - 36.47x slower
inject(0).fdiv(size): 18082.6 i/s - 36.95x slower
inject(0.0) / size: 15264.0 i/s - 43.78x slower
Anything else?
Rails’ Active Record does include an #average
method. It’s used to perform calculations on numerical columns in the database directly in SQL.
If that is your specific use case, you should definitely use that method. Instantiating Active Record models and subsequently iterating over the Array in Ruby will nearly always be slower.
Last updated on February 2nd, 2020