Caching Data Access Strategies

Most businesses with an online presence are familiar with in-memory caching. Online purchases continue to grow fast, with up to 14.1% of all retail sales globally being made online. The current rate of growth also puts eCommerce sales at 22% by 2023. This rapid growth comes at a cost, however, and that cost is a rise in website usage and data requests that, if not managed properly, can lead to system slowdowns and delayed responses. These delays, even a few seconds of them, can lead to a loss of customers and revenue. Users demand fast and always-on services, and caching can help businesses keep up.

There are several ways you can use caching in your system, and it's important to know what data access strategy you're going to implement. Your chosen data access strategy will significantly affect your application or system design because it dictates the relationship between the data source and caching system. Before choosing one, you should first analyze the access pattern of the data to determine the suitable strategy.

Choosing The Right Data Access Strategy

In-memory solutions like caching have been used for years due to it being 100 times faster than traditional computing systems. Based on RAM data storage and indexing, these systems require very minimal maintenance and performance tuning but provide a fast and stable experience for end users. Different data access strategies have their pros and cons; choosing the "right" one from the options below will entirely depend on your specific use case and business objectives.

Read Through/Lazy Loading

This strategy doesn't put too much load on the cache because it loads data into it only when necessary. If an application needs data, the system looks for it in the cache first. If the data isn't there, it's retrieved from the data source, placed in the cache, and then returned.

Advantages:

  • Data is available on demand; cache doesn't load or store all data together. There are cases when an application might not need all the data cached from the data source, and this is an ideal approach in those cases.

  • Even if a node fails in a system with multiple cache nodes, the application only experiences increased latency and isn't harmed in any other way. During a node failure, a new one simply comes online to allow requests flow smoothly and populate the required data.

Disadvantages:

  • If there are changes in the database before the cache key expires, it will feed stale data into the application, which can become an issue.

  • There's a noticeable delay in cache response during a cache miss because the system does three round trips within the network. It checks the cache, retrieves data from the database, and feeds the retrieved data into the cache.

Write Through

This strategy inserts the data into the cache while updating the data in the database. This helps prevent staleness of data by allowing both operations to occur within a single transaction.

Advantages:

  • The possibility of stale data is eliminated.

  • It is ideal for read-heavy systems because they have a low tolerance for stale data.

Disadvantages:

  • Each write operation does two things: it writes to the data source and then writes to the cache.

  • If any cache node fails during a write, the operation fails altogether, affecting the consistency of data between cache and data source.

  • Cache churn can be an issue if most of the data is never read.

Write Behind Caching

This strategy allows the application to write data directly to the caching system. This data is then synced asynchronously to the underlying data source. The caching service needs to maintain a queue of write operations so they can be synced in order of insertion.

Advantages:

  • When using this strategy, there's no need to wait until the data is written to the underlying data source because the application writes to the caching service only. Overall performance is improved by performing both read and write operations at the caching side.

  • This strategy protects the application from database failure. It mitigates failure by allowing queued items to be requeued in case the database fails.

Disadvantages:

  • Direct operation on database or joining operations may be based on stale data, causing issues in eventual consistency between the database and caching system.

Refresh Ahead Caching

This strategy ensures that cached data is refreshed before it expires. This is made possible by refreshing the cache at configured intervals before the next cache access.

Advantages:

  • Periodical and frequent refreshing of data ensures staleness won't be a permanent problem.

  • It offers lower latency compared to other data access strategies.

Disadvantages:

  • This can be challenging to implement because there's extra pressure on the cache service that refreshes all the keys as they are accessed.

Understanding these data access strategies before you start planning and designing your system will help determine what type of system you need. There's no set approach or cut-and-dried methods you can pick from a playbook because different systems will have different requirements. Choose one or a combination of the strategies above, but remember to stay true to your business goals and adjust data access strategies if necessary, not the other way around.