Caching Data Access Strategies

Most businesses with an online presence are familiar with in-memory caching. Online purchases continue to grow fast, with up to 14.1% of all retail sales globally being made online. The current growth rate also puts eCommerce sales at 22% by 2023. However, this rapid growth comes at a cost, and that cost is a rise in website usage and data requests that, if not appropriately managed, can lead to system slowdowns and delayed responses. These delays, even a few seconds of them, can lead to a loss of customers and revenue. Users demand fast, always-on services, and caching can help businesses keep up.

There are several ways you can use caching in your system, and it's important to know what data access strategy you're going to implement. Your chosen data access strategy will significantly affect your application or system design because it dictates the relationship between the data source and caching system. Before choosing one, you should first analyze the access pattern of the data to determine the suitable strategy.

Choosing The Right Data Access Strategy

In-memory solutions like caching have been used for years because it is 100 times faster than traditional computing systems. Based on RAM data storage and indexing, these systems require minimal maintenance and performance tuning but provide a fast and stable experience for end-users. Different data access strategies have their pros and cons; choosing the "right" one from the options below will entirely depend on your specific use case and business objectives.

Read Through/Lazy Loading

This strategy doesn't put too much load on the cache because it loads data only when necessary. If an application needs data, the system first looks for it in the cache. If the data isn't there, it's retrieved from the data source, placed in the cache, and returned.

Advantages:

  • Data is available on demand; cache doesn't load or store all data together. There are cases when an application might not need all the data cached from the data source, which is an ideal approach.

  • Even if a node fails in a system with multiple cache nodes, the application only experiences increased latency and isn't harmed in any other way. A new one simply comes online during a node failure to allow requests to flow smoothly and populate the required data.

Disadvantages:

  • If there are changes in the database before the cache key expires, it will feed stale data into the application, which can become an issue.

  • There's a noticeable delay in cache response during a cache miss because the system makes three round trips within the network. It checks the cache, retrieves data from the database, and feeds it into the cache.

Write Through

This strategy inserts the data into the cache while updating the data in the database. This helps prevent data staleness by allowing both operations to occur within a single transaction.

Advantages:

  • The possibility of stale data is eliminated.

  • It is ideal for read-heavy systems because they have a low tolerance for stale data.

Disadvantages:

  • Each write operation does two things: it writes to the data source and then writes to the cache.

  • If any cache node fails during a write, the operation fails altogether, affecting the consistency of data between cache and data source.

  • Cache churn can be an issue if most data is never read.

Write Behind Caching

This strategy allows the application to write data directly to the caching system. This data is then synced asynchronously to the underlying data source. The caching service needs to maintain a queue of write operations so they can be synced in order of insertion.

Advantages:

  • When using this strategy, there's no need to wait until the data is written to the underlying data source because the application writes to the caching service only. Overall performance is improved by performing both read and write operations at the caching side.

  • This strategy protects the application from database failure. It mitigates failure by allowing queued items to be requeued if the database fails.

Disadvantages:

  • Direct operation on database or joining operations may be based on stale data, causing eventual consistency between the database and caching system.

Refresh Ahead Caching

This strategy ensures that cached data is refreshed before it expires. This is made possible by refreshing the cache at configured intervals before the next cache access.

Advantages:

  • Periodical and frequent data refreshing ensure staleness won't be a permanent problem.

  • It offers lower latency compared to other data access strategies.

Disadvantages:

  • This can be challenging to implement because there's extra pressure on the cache service that refreshes all the keys as they are accessed.

Understanding these data access strategies before planning and designing your system will help determine what type of system you need. There's no set approach or cut-and-dried methods you can pick from a playbook because different systems will have different requirements. Choose one or a combination of the strategies above, but remember to stay true to your business goals and adjust data access strategies if necessary, not the other way around.