Cache it? All you want to know, but are afraid to ask

As a backend team, we are responsible for reverse HTTP proxy accelerator, and deeper apps. The most “funny” thing learned from our experience is that there are no decisions which are set in stone. The circumstances change constantly and caching technique has to be calibrated non-stop.

So, let us share some aspects to consider.

Look at the diagram:

As you can see we have to find balance. How to read it? The more TTL you put on the resource, the more important the purging mechanism is, together with cache orchestration. On the other hand, the less TTL (or even = 0, no cache at all), the stronger backend is needed.

Of course, we know two different approaches. One states: “hey, write backend as efficiently as possible to be prepared for the worst” – that hypothesis is supported by guys who know that content will be more & more personalized, so anonymous users will be the past soon. The other says: “hey, you have such a perfect mature shield, why not to use it? In the professional world, you should use <<protection>> even if it is doubled”.

THE STRATEGY

We can say there is no silver bullet, and as always, everything depends on something; but we need to start somewhere, so at the begin, check the list below. It determines the strategy of HTTP caching.

1. WHAT you are going to cache:

Content type (is it html, static web resources (css, js, fonts, sprites))
The size of response – example images (here we have different techniques, especially how images are represented by url). Images usually are much heavier, so we don’t want to blow up our reverse HTTP proxy
Is it API call (RESTFull)? Ajax or not? If yes, what kind of resource

2. WHO has the most info about the resource:

Maybe backend app knows more about context for resource, so shift policy (s-max-age) to apps
Or CDN
Or maybe even client (SPA architecture)

THEORY (Protocol)

Our good friend HTTP gives us a lot of mechanisms to use. Cache-control, Expires, Date, Etag, max-age, pragma no cache…. Which one to choose, which ones are deprecated, what are default behaviors (maybe some heuristic?)?

In our case, we commonly use max-age header. It is the ease one to process, doesn’t cause troubles with date parsing, zones etc.

Just a simple value which is always up-to-date. A similar header – s-maxage – helps a lot in communication between proxy and backends, sending explicit instruction on how to control the reverse proxy. No problem with time synchronization between servers, just focus on the time window.

APP side

In the RESTful world we usually have collections and instances. Both of them have different nature from the caching perspective.

Instances have their own workflow (state machine), which is strongly related to purge mechanism. It depends on the traffic and capacity of our backend. We can always try to minimize the TTL and reflect the most actual state of an instance.

In case of collections, a very popular mechanism of pagination is based on date (since, until params). In such situation, we have to study our domain of business and understand how our instances change during their lifetime. Sometimes it is obvious: the latest instances are more likely to change,so it is better to keep low TTL; with the old instances (small probability to be changed), we can risk and increase the TTL.

Let’s see an example of a generic search API, where the backend app knows the most about policy. In this case, we prefer to put logic into our apps, and instruct Reverse proxy by s-max-age.

http://api.com/v3/search?query=*&until=2012-05-05 (high TTL, for old data)

http://api.com/v3/search?query=*&until=now&limit=10 (TTL ~ 0)

NOT ONLY HTTP CACHE

While developing the application, you may find out that HTTP cache is not sufficient. If your business model is based upon a complicated graph of nested objects, then querying a database each time HTTP cache has been evicted will become a bottleneck of the whole application.

The core business object of our domain is an “article”. That consists of many nested objects (i.e. tags, pictures, authors). Our clients expect from our application that it will serve each article within milliseconds (average response time should be below 10 ms). We cannot afford to request our databases each time HTTP cache was invalidated – that approach is too slow and would generate too much traffic, causing performance issues.

Because of the problems mentioned above we decided to introduce an external cache layer, which will be responsible for storing articles and their relations as sets of key-value objects. Each object has an individual lifetime – static data (authors, sections, tags) is kept in the cache longer than the body of the article (which may change a few times per minute).

Our application uses Redis as an external cache – it matches very well with our requirements. First of all, it has guaranteed constant access time for data (at least in key-value case). Secondly, it supports many data types (sets, lists, ordered sets – we use it for building ordered collections of articles). Finally, its performance is GREAT – it easily handles thousands of requests per second. When Redis was introduced to our technology stack, the read access for one article has decreased from 250 ms to 5-10 ms.

Thanks to storing objects related to the article as separate types we’ve been able to introduce lazy-loaded repository for each kind of data. With each request, there is a verification if that particular object is already in cache. If so, then it is returned as a response, otherwise, it is loaded from the database and stored in Redis. With this approach, we are able to evict parts of an article without refetching all nested objects.

Having external cache causes each instance of the application to have access to all needed data just after bootstrap phase. On the one hand, there is no need for complicated replication or sophisticated purge mechanisms. On the other, this solution is not flawless – each request requires sending some chunk of data through the network, what is slower than reading that data directly from the memory. There is also a risk that your shiny cache will become unavailable because of some network issues, and as a consequence, it will become a single point of failure (master-slave replication reduces that threat, but it is still possible).

The introduction of external cache allowed us to build fast and reliable mechanism for fetching our business objects. Our application gained huge performance boost, and the traffic to our slow backends has been reduced by a few times. We’ve been working with that solution for over a year and it has caused almost no problems.

PURGING

It’s quite cool to have cache, isn’t it? It helps with hiding problems with backends serving the content, it serves data faster, operates on objects in cache and distributes over many nodes, datacenters, regions etc. The first thing that comes to our mind when we think about caching is being able to put new objects in cache in a reasonable time and the possibility of serving them as fast as possible. As it was already mentioned, it is really important to tweak TTL value of objects in cache in such a way it allows us to decrease the traffic to the backends as much as possible and to serve the newest version (almost) always to everyone.

In theory, we could play with the TTL value and calibrate it as long as the results satisfy us. And this approach was valid for many years but it seems it is becoming less and less applicable because of the need of serving more and more personalized content, as the new data that change randomly: sometimes often and sometimes rarely. These all new mechanisms coming straight from business logic require a better way of handling and orchestrating objects in cache.

SUMMARY

Cache is not a big chunk of the same kind of data that can be served to many customers and refreshed from time to time. The split of monolith-like applications, and then splitting them into even smaller parts, commonly known as microservices, showed one big need. Each small part of these big ecosystems is responsible for some operations on data, and each of it is aware of the many events going through it. These all requirements form a need for an efficient and scalable solution for purging cache in an event-driven manner. This is why we decided to have a microservice working in the same way as the rest of our applications in terms of building, testing and deploying.