bcobb

Debugging edge caching behavior

May 30, 2020

I really loved this comic Julia Evans posted, and it made me want to share a little “spy tool” I made at work earlier this year to figure out if a change I wanted to make to our edge caching configuration would have the desired result.

The spy

Here’s a simplified version of the spy:

class RequestSpyController < ApplicationController
  def show
    render json: {
      ips: {
        remote_ip: request.remote_ip,
        ip: request.ip
      },
      headers: request.env.select { |k, _| k.match(/^[A-Z]/) }.as_json,
    }
  end
end

There’s not much to it! It’s an API endpoint that returns the request headers, and the Rails-interpreted IP addresses. It’s been a useful way to quickly test my assumptions about how our edge cache configuration and our Rails configuration affect request headers.

Appendix: what is edge caching

Before I joined Harry’s, I wasn’t too familiar with the concept of edge caching.

The idea is that you can put a geographically-distributed caching proxy between users and a web application to reduce the time it takes for cached URLs to be served. That is, by some mechanism, requests will be routed through the geographically closest proxy server. Edge caches are highly configurable, but typically they work as follows.

If the server doesn’t have a cached response, the request will be forwarded on to the origin server. The origin’s response might be cached before being forwarded to the client.

Diagram illustrating a cache miss
Forwarding a request to the origin server when there's nothing cached for the request.

But, if the edge server has a cached response for a URL, the response will be served without needing to forward the request through to the origin server.

Diagram illustrating a cache hit
Returning a cached response without consulting the origin.

Serving from the cache is especially nice if the origin server is thousands of miles away from the computer making the request!

Especially when using Varnish and VCL, it feels similar to configuring nginx as a front-end web server to a back-end like Wordpress or Rails.

Appendix: Why would it set headers?

The big caveat with edge caching is that, by default, once a URL is cached, the same response will be served until the cache entry expires. Without special configuration, this makes it unsafe to include any information that could vary per-request in the response, such as a navigation menu which includes the user’s name.

A common way to work around this caveat is to program the edge cache to alter the request headers right after it receives a request – in particular to add some custom headers with a normalized set of values. For example, the edge cache could add an X-Segment header to the inbound request, before it forwards the request to the origin server. Maybe it looks to see if a session cookie is present in the request, and sets the X-Segment header value to logged-in if it is, and logged-out if it’s not. If the origin server returns a Vary response header with a value of X-Segment, the edge cache will store one response per combination of URL and the value of the X-Segment header, rather than one response per URL.

With this kind of cache segmentation, the origin server could do something like return a different navigation bar for logged-in users than for logged-out users, and still get the benefits of edge caching. But this approach hinges on setting the X-Segment header correctly, which makes the RequestSpyController above very handy!