If any of you are using it or give it a run and have any issues/questions - feel free to ping me anytime at pglombardo at gameface.in or via a support ticket through AppNeta. I’d love to get as much feedback as possible. They have a free-forever project-level plan that always shows the last 24 hours of performance data.
Rails 4 is out featuring Russian Doll caching (AKA Cache Digests). In this article, I apply Russian Doll caching to one of my poorer performing Rails 3 pages using the cache_digests gem.
ActionView templates are great. They are easy to code, manage and extend but the one thing they are not is fast…at least not out of the box.
ActionView is Slow; Pitfalls Ahead
ActionView puts forth a great development pattern of layouts, views and partials that is easy to understand, implement and maintain but that comes at a cost: The rendering process is complex and slow.
The screenshot above shows the timings for the
Users#show URL on Gameface. The page in question is fairly straight forward containing four components: a topbar, sidebar, user details and a listing of game characters.
With no caching at all, the ActionView layer averages roughly ~365ms for this URL. This represents over 80% of average request processing time and dwarfs all of the other layers combined.
In terms of performance, ActionView is a weapon of mass destruction and is the low-hanging fruit for improvement.
Russian Doll Caching
Russian Doll caching is a type of nested fragment caching that auto expires fragments when object timestamps change.
You still make the same calls as previous fragment caching schemes in Rails:
- cache @user do (user view data) - cache [ 'details', @user ] do (user details view data) - cache [ 'characters', @user ] do - @user.characters.each do |character| - cache character do (character view data)
With Russian Doll caching (AKA cache digests) unique cache keys are formed using an md5 stamp based on the timestamp of the object being cached:
The advantage of this is that when objects are updated and outer fragments are automatically invalidated, nested fragments can be re-used. (russian dolls)
A key requirement to this is that children objects should update the timestamps of their parent object by using ActiveRecord
touch option. This will allow for automatic cache invalidation and avoid serving stale content.
class Character < ActiveRecord::Base belongs_to :user, touch: true end
Cache Friendly Page Layouts
When caching, it’s best to avoid caching logged-in content since the same caches will get served to all users regardless of any logged in status.
For effective (and less problematic) fragment caching, designing the page well to separate out logged in content is critical.
Below is the layout for the Gameface profile view before re-design. Logged in specific content and links are sprinkled throughout the page making it hard to divide it up into cache-able fragments.
Properly caching this page as-is would be complicated and inefficient since we would have to tip-toe around logged-in content.
To fix this, I re-organized the page to group the logged-in specific content into one specific area. With logged-in content out of the way, we are then free to cache the rest of the page in well-defined fragments.
With the page re-design and Russian Doll fragment caching applied to the large majority of the page, we now average a much better ~120ms for ActionView on that URL. A reduction of 265ms or 67% in average processing time.
On top of the performance improvement, we also get automatic cache invalidation as object timestamps are updated. This greatly simplifies the whole system by not requiring cache sweepers or other tricks to invalidate stale caches.
Russian Doll caching is a small but significant improvement over prior caching in Rails. When used effectively, it can greatly reduce server side ActionView processing and automatically expire stale fragments.
We took a previously un-cached URL with a poor page layout that was averaging ~365ms of processing in ActionView and reduced that number to ~120ms for a 67% performance improvement.
When using fragment caching, note which backing cache store you are using. The default cache store in Rails is the filesystem but you can get even greater performance by backing your cache store with memcache or redis instead.
See the redis-rails gem as an example.
Coming soon, I’ll revisit this page under Rails 4 to further explore performance improvements over Rails 3.
Separate out logged-in content from agnostic content for best cache coverage. Page design affects caching efficiency.
When possible, always call cache with the object being cached. Cache keys and their eventual invalidation will be keyed off of the timestamp on the object being cached.
- cache @user do (user view data) - cache [ 'details', @user ] do (user details view data)
To cache an index of objects, you have to revert to manually passing in the
expires_in option since there is no single object timestamp to key off of.
- cache 'user list', expires_in: 30.minutes do (user list view data)
belongs_to model relationships with
touch: true so that parent fragment caches can be invalidated when children are updated.
Collect timing data before and after to quantify and validate changes.
Update May 14, 2013: AppNeta now offers free tracing for single projects. Check it out on their pricing page.
TraceView by AppNeta provides deep performance monitoring of web applications.
It gives you insight into your web application performance such as this:
and a per request drill-down that shows you the nitty gritty detail of where time is spent in individual requests (full-size):
and even end-user monitoring:
Disclaimer: I authored the Ruby instrumentation for Traceview so I may be a bit biased. …but with good reason!
Installing TraceView consists of two parts: 1) installing the system daemon on your host and 2) installing the Ruby gem in your application.
Why a system daemon? TraceView uses a system daemon to collect instrumentation from sources beyond application code such as host metrics, Apache or Resque.
The system daemon is installed with two commands that can be pasted into your shell. An account specific version of these commands are available in your TraceView dashboard once you create an account. (Under Settings; App Configuration; Trace New Hosts)
And the gem for your application Gemfile available on RubyGems:
TraceView functions by sampling a subset of all requests that go through your web application. This sample rate must be set at the entry point of requests in your application. This can be a load balancer, Apache/nginx or Ruby itself. Successive hosts and software stacks that requests pass through will act appropriately according to upstream settings.
Yes. That means requests can be traced across hosts, software stacks even track internal API calls via HTTP/SOAP/REST which make for spectacular application insight. But that’s another post for another time.
For this walkthrough we’re going to assume that you’re running a Ruby on Rails application with Unicorn, Thin, Puma or any other Rack based Ruby webserver.
To instead configure Apache/nginx as the entry point, see here.
Ruby as the Entry Point
If you setup Apache or nginx as your entry point then you can skip this part entirely. The
oboe gem will take it’s tracing instructions from upstream automatically.
oboe gem is installed in your application, it makes available
Oboe::Config which is a nested hash of configuration options that affect how your application is instrumented.
Luckily, the defaults are very smart and only a couple initial values need to be configured.
The two required values are:
Oboe::Config[:tracing_mode] = 'always' Oboe::config[:sample_rate] = 100000 # 10% of incoming requests
These values enable tracing (:tracing_mode) and sets the sample rate to 10% (:sample_rate). The
sample_rate is calculated by the value out of one million. e.g. 300000 would equal a 30% sample rate meaning 3 out of 10 requests would be sampled. (For low traffic sites, you may want to set these values higher.)
Getting Performance Data
And that’s it. Now to the good stuff.
If you want to test that the oboe gem is functional, start-up a Rails console, you should see a message to
stdout similar to:
Tracelytics oboe gem 220.127.116.11 successfully loaded.
Note that on hosts that don’t have the system daemon installed, the oboe gem disables itself and outputs a message to that fact.
Unsupported Tracelytics environment (no libs). Going No-op.
Deploy/restart your application and you should start seeing traces show up in your TraceView dashboard after a couple minutes.
Installation and setup of TraceView for your application is a simple two step process that can be done in 10 minutes or less. Traceview gives a unique in-depth view into requests even as they cross hosts and software stacks.
Things are moving fast for the Ruby language instrumentation in TraceView. We already support tracing of memcache-client, memcached, dalli, mongo, moped, mongoid, mongomapper, cassandra, ActiveRecord (postgres, mysql, mysql2) plus more. Most recently we added support for Rack and Resque tracing. For a full-list of supported libraries, see the top of this article.
If you haven’t tried out TraceView yet, give it a run. You won’t be disappointed.
Extras: Some Random Chart Porn
A screenshot that I sent to Linode when performance unexpectedly dropped:
Linode migrated my VPS to a lesser utilized host with evident results (Thanks Linode):
An older issue that Gameface had with atrocious rendering times: