Acunote is online project management and Scrum software. Acunote is fast and easy to use. It shows actual progress, not just wishful thinking. Click here to learn more.
« Back to posts

Garbage Collection is Why Ruby on Rails is Slow: Patches to Improve Performance 5x; Memory Profiling

  • The News: Ruby on Rails performance is dominated by garbage collection. We present a set of patches to greatly improve Rails performance and show how to profile memory usage to get further performance gains.

  • What's at Stake: Rails is slow for many uses and did not lend itself well to optimization. Significant performance gains could only be achieved at application level at large development cost.

  • The Upside:

    • 5x potential performance gains;
    • easy way to identify whether GC is a bottleneck;
    • deterministic process to fix memory bottlenecks;
    • set of canned patches to solve the biggest problems;
    • you can help

I don't have time, make my app 5 times faster!

  • Apply the monkey patches below. Enjoy.
  • Time your app with garbage collection disabled to figure out how much more performance you are leaving on the table. Enjoy.
  • Come back for more patches. Enjoy.

What's going on here?

Ruby runtime uses garbage collection. Your code allocates memory as it runs. From time to time garbage collector kicks in, halts the whole system, and cleans up no-longer-referenced memory using mark-and-sweep algorithm.

GC gets triggered by multiple conditions. The one that matters the most is 8M trigger. Every time you allocate 8M of memory GC runs. Complex Rails request can allocate hundreds or even thousands of megabytes of memory, making GC runs dozens of times. Each GC pass takes 50-150ms. You do the math.

It's possible to patch ruby interpreter, increase the triggers and reduce frequency of garbage collection at the cost of additional memory use. This helps, but does not resolve the problem. If there is more garbage, each GC run takes longer. We'll devote a separate post to this later.

But, I am running ruby-prof already, you ignorant fool!?

And so are we. It does not really help:

  • ruby-prof attributes full cost of garbage collection to the method where GC gets triggered, not the methods that allocate the memory. That's like blaming the last straw for breaking the camel's back.
  • ruby-prof does not show that garbage collection took place and how much time it took. All you can tell is that some method took a long time.

If you go by the these result, and remove the straw that's breaking the camel's back his load does not get much lighter.

Statistics being on your side, things are not entirely hopeless. GC gets triggered within malloc, so methods doing a lot allocations are also likely to trigger GC. Further, simply allocating memory is relatively expensive, and that will also show up in profiler output.

You might get lucky, optimize ruby-prof hotspot and get better performance, but more often than not you will fail. The correlation is just too weak. We believe this is exactly why Rails resisted performance optimization. Up till now.

Profiling memory allocations

Ruby-prof is almost the right tool for memory profiling. "Almost" is because the free cheese is only found in the mousetrap. You'll have to do some work first.

You need to build patched ruby (patch here) and latest ruby-prof (patch here). With both tools patched, set RubyProf.measure_mode = RubyProf::MEMORY before running the profiler.

Make sure to turn off GC when profiling for pure performance. You'll get get much more representative results.

The detailed HOWTO and description on the process is in our next blog post.

Will it help me?

In general your code will benefit most if you:

  • Create or update a lot of rows using ActiveRecord
  • Run large number of SQL statements of any kind using ActiveRecord, e.g. load many AR object through separate #find calls
  • Use ActiveView helpers, especially in a loop, to generate complex HTML
  • Use has_many AR association

It's easy to tell for sure if you have performance tests. All that "performance test" means is that you can run a single request and time how long it takes.

Just run the same test as usual, and then with GC disabled and compare the results. Most code will run much faster with GC disabled. E.g. for Acunote we test how long it takes to copy 120 tasks from one sprint from another.

Before optimizations with GC:   14 sec
Before optimizations w/out GC:   6 sec

You can't run ruby without GC in production. It will rapidly run out of memory and crash. You can, however, apply optimization, produce less garbage and get your code be even faster than original benchmark with GC turned off!

After optimizations with GC:     3 sec

The code that allocates little memory, but is algorithmically complex and is slow because of that, will see little improvement. For example Acunote's burndown and sprint completion date calculations fall squarely into this category and did not benefit. These aside, Acunote performance was dominated by the cost of memory allocation.

Give me the patches!

Problem Patch
ActiveRecord::Base#create calls attributes_with_quotes twice, ActiveRecord::Base#update once. #attributes_with_quotes calls #attributes, which clones all attribute values. None of these clones are necessary.

Impact:
When copying 120 tasks in Acunote this costs 650M. Patch improves performance from 14s to 6s.
Notes:
Helps when you do a lot of creates and updates through ActiveRecord.
Rails 1.2
module ActiveRecord
    class Base
    private
        def attributes_with_quotes(include_primary_key = true)
            result = {}
            @attributes.each_key do |name|
                if column = column_for_attribute(name)
                    result[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primary
                end
            end
            result
        end
    end
end
Rails 2.0
module ActiveRecord
    class Base
    private
        def attributes_with_quotes(include_primary_key = true, include_readonly_attributes = true)
            quoted = {}
            @attributes.each_pair do |name, value|
                if column = column_for_attribute(name)
                    quoted[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primary
                end
            end
            include_readonly_attributes ? quoted : remove_readonly_attributes(quoted)
        end
    end
end
Rails surrounds every SQL call and ActionView::Base#render with Benchmark#realtime. Benchmark#realtime allocates unnecessary 45k per call.

Impact:
When copying 120 tasks in Acunote this costs 120M. Patch improves performance from 6s to 5s.
Notes:
Helps when you do read, create, update lots of AR records, or execute any other SQL, or call #render(:partial) in a loop.
module Benchmark
    def realtime
        r0 = Time.now
        yield
        r1 = Time.now
        r1.to_f - r0.to_f
    end
    module_function :realtime
end
Rails 1.2 tries to pretty-print and log every SQL statement even when logger is in Logger::INFO mode (production environment) and prints nothing.

Impact:
When copying 120 tasks in Acunote this costs 17M. Patch improves performance by 1s.
Notes:
Helps when you do read, create, update lots of AR records, or execute any other SQL in a loop.
Patch only applicable to Rails 1.2. Rails 2.0 has the correct log_info implementation.
Rails 1.2
module ActiveRecord
    module ConnectionAdapters
        class AbstractAdapter
        protected
            def log_info_with_level_check(sql, name, runtime)
                return unless @logger and @logger.level == Logger::DEBUG
                log_info_without_level_check(sql, name, runtime)
            end
            alias_method_chain :log_info, :level_check
        end
    end
end
Rails 2.0
No patch necessary.
Comparison of BigDecimal's (used by Rails for Numeric data types) to booleans is slow because it requires unnecessary method_missing call and exception catch.

Impact:
When rendering 120 tasks in Acunote this costs 4M. Patch improves performance by 100-120ms.
Notes:
Helps when you have the imprudence to accidentally compare BigDecimal's with true or false.
class BigDecimal
    alias_method :eq_without_boolean_comparison, :==
    def eq_with_boolean_comparison(other)
        return false if [FalseClass, TrueClass].include? other.class
        eq_without_boolean_comparison(other)
    end
    alias_method :==, :eq_with_boolean_comparison
end
ActionView::Helpers::JavaScriptHelper#link_to_function allocates lots of memory when run. It runs every time view is rendered.

Impact:
When rendering 120 tasks in Acunote this costs 5M. This improves performance by 100-110ms.
Notes:
Helps when you have rich AJAX page with lots of dynamic elements. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests.
Hand-replace link_to_function with <a href="#" onclick="..."/>.
ActionView::Helpers::FormTagHelper#text_field_tag allocates lots of memory when run. It runs every time view is rendered.

Impact:
When rendering 120 tasks in Acunote this costs 2M. This improves performance by 40-50ms.
Notes:
Helps when you have page with lots of input boxes. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests.
Hand-replace text_field_tag with <input type="text"/>.
ActionView::Helpers::AssetTagHelper#image_tag allocates lots of memory when run. It runs every time view is rendered.

Impact:
When rendering 120 tasks in Acunote this costs 8M. This improves performance by 110-150ms.
Notes:
Helps when you have page with lots of images. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. image_tag itself can be optimized. Help wanted.
If you don't use asset hosts, hand-replace image_tag with <img src="..."/>.
If you use asset hosts, there's no patch.
Passing a block to a method on instance of ActiveRecord::Associations::{HasAndBelongsToManyAssociation /etc.} chews up memory.
An example (from activerecord/lib/activerecord/associations.rb) is association.select { |record| record.new_record? } line inside #add_multiple_associated_save_callbacks method.

Impact:
This is used all over ActiveRecord code, and often by application code.
Notes:
We have been unable to figure out the root cause. Help wanted. Separate article to come.
None known.
Sometimes it helps to convert association to array before using, for example association.to_a.select { |record| record.new_record? }

How can I help?

  • Profile memory allocations for your Rails application. Figure out and fix hotspots. Post or link to patches and notes in the comments section. We'll be updating this post, as you and we figure out more.
  • If you are writing a large commercial Rails application, have one of you engineers do some memory profiling. Fast is a feature, and it's a nice way to contribute to the community.
  • Help us figure out the some of the issues we found, but haven't been able to figure out yet.

Submitted Patches

Thanks To

  • Alexander Dymo for actually doing all the work
  • Alexander Goldstein for directing us to look at GC in the first place
  • DHH and rails-core for giving us something to optimize, and for making Rails code so easy to understand and modify
  • Charlie Savage and Shugo Maeda for ruby-prof
  • Sylvain Joyeux's for original memory allocations number measurement patch
  • Stefan Kaes for all his performance work, railsbench and especially GC patch

Sightings

Updates

  • Update 1: noted that patches are against 1.2.4 Requested patches against 2.0
  • Update 2: added patches against Rails 2.0
  • Update 3: added "submitted patches" section with upstream patches
  • Update 4: #realtime and #attributes_with_quotes patches accepted upstream
  • Update 5: added Sightings section
Read comments on this post on Hacker News