The Diffen Blog • Using Cloudflare Workers to get your Fastly cache...

Table of Contents

Using Cloudflare Workers to get your Fastly cache hit rate to 95%

I love both Cloudflare and Fastly, and use both services for Diffen. Cloudflare powers the DNS and delivers the assets (images and JS) on static.diffen.com. Fastly is the CDN for serving HTML content, powering the main www subdomain. (if you’re wondering why, it’s because Fastly lets you stream your access logs to Bigquery, a feature only available to enterprise customers on Cloudflare.)

This is the story (and code) of how I improved the cache hit ratio from Fastly using Cloudflare Workers.

How a CDN works
A CDN has several POPs (points of presence) across the world. For example, Fastly’s network map is here. When a user makes a request for a web page, it gets routed to Fastly, and is usually handled by the POP that is closest to the user. If that POP has the content in its cache, it can serve it to the user immediately. But if the content is not cached at that POP, Fastly makes a request to the “origin” – the server that actually runs your web app – to fetch the content. This extra hop adds latency, and the total response time for the end user could end up being slower than having no CDN at all.

Naturally, you want your cache hit ratio to be as high as possible.

Priming the cache
Ideally, I’d like to prime the cache in every Fastly POP for my most popular content. And given that resources get booted out of Fastly’s cache depending upon their load, I’d like to re-prime periodically. Unfortunately, Fastly does not provide a mechanism to prime its cache.

One way to do it would be to lease VPSes in cities all over the world and run cron jobs to request these pages every hour. But this is… infeasible. Run AWS lambda functions from various data centers? Not enough diversity of cities. There are far more Fastly POPs than AWS data centers.

So let’s fight fire with fire. Cloudflare’s network has as many (if not more) POPs as Fastly. And lucky for us, Cloudflare allows you to run code on the “edge” i.e., in each of its data centers.

Using Cloudflare workers

A couple of weeks ago, Cloudflare announced Triggers for workers. I was excited to get cron job-like functionality for workers. However, their blog post says:

Since it doesn’t matter which city a Cron Trigger routes the Worker through, we are able to maximize Cloudflare’s distributed system and send scheduled jobs to underutilized machinery.

So we can’t specify which POP is used to run our worker. Bummer! How else can we run our crawlers from various POPs around the world where our users are? 

We use our users’ locations. Whenever a user visits a page, we invoke a Cloudflare worker. This worker will run at a Cloudflare POP closest to that user.

fetch(‘https://cache-primer.diffen.workers.dev’);

The job of the worker is to (1) make a call to an endpoint to request a list of URLs to prime the cache for, and (2) crawl those URLs.

async function handleRequest(request) {
  const resp = await fetch('https://www.diffen.com/API-that-returns-a-list-of-urls-to-crawl’);
  let urls = await resp.json();
  for(let url of urls){
     await fetch(url);
 }
 return new Response(“OK”, {
     headers: {
         'Access-Control-Allow-Origin’: 'https://www.diffen.com’
     }
 });
}

The job of the API-that-returns-a-list-of-urls-to-crawl is to (1) know the top pages we need primed in Fastly’s cache, (2) maintain the list of pages we have cached in each Fastly POP, and (3) for a given request (from a Cloudflare worker), return ~10 pages that we have not primed yet in that POP.

Let’s say the user is in Seattle so Cloudflare’s Seattle POP is where the worker runs. It makes a request to the API to get a list of URLs to crawl. This API request is routed to the origin server via Fastly’s Seattle POP. On the origin, we can see the name of the POP in the X-Fastly-City header of the request.

<?php

header(“Cache-Control: private, max-age=0”); //Make sure this response does not get cached.
header(“Content-type: application/json; charset=utf-8”);

$topUrls = getMostPopularUrls(); //This is a static list
$headers = getallheaders();
$city = $headers['X-Fastly-City’];
$primedUrls = getPrimedUrls($city);
$unprimedUrls = array_diff($topUrls, $primedUrls);
$newUrlsToPrime = array_slice($unprimedUrls, 0, 10);
primeNewUrls($city, $newUrlsToPrime); //Maintain a list of the URLs we have already primed
                                       //so we don’t prime them again for a few hours.

echo json_encode($newUrlsToPrime);

return;

That’s all there is to it.

We are now using one CDN to prime the cache at another. Cities with more users will have a higher chance of primed caches.

How much does it cost?

Cloudflare lets you make 100,000 requests per day for free. The paid plan is $5 for 10 million requests per month. Totally worth it.

Results
The cache hit ratio went from ~75% to ~95% with this change. YMMV; the strategy would work better for top-heavy sites where a small number of pages account for a large share of overall traffic. I wouldn’t want to jam thousands of pages into Fastly’s cache when there is a slim chance of a real user ever needing them.

ncG1vNJzZmianKS0b7DIn52epl6YvK57z6iqrWdmaH56gpdubGxsaWaBcoGPbWdoraOeu6h5wqWmrpyWoa6zsYywpqujlafAbsDOZqCmqKKkw6Z5xZqqraSpYrCir8eeZKGhpGK%2FosA%3D