Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Refresh your cached data asynchronously

5.00/5 (17 votes)
18 Apr 2013CPOL5 min read 98.9K  
A pattern for an always available cache using asynchronous refresh.

Introduction

Microsoft .NET provides some excellent data caching mechanisms. They are fairly easy to implement and integrate in your application. In this article we will discuss some specific scenarios when a caching mechanism is not enough due to the fact that the data you want to cache is very expensive to retrieve.

Background

If you want to get your hands dirty with cache in general you can start by clicking here: http://www.codeproject.com/search.aspx?q=Caching&doctypeid=1%3b2%3b3 .

What makes a good caching mechanism?

  • It is fast: sounds like stupid to mention but this is definitely the primary goal.
  • It is clearable: users must have the ability to clear the cache on-demand. Who wants to work against old data when knowing fresh data is out there?
  • It is time constrained: it’s unlikely that cached data will never ever change. So a good cache mechanism must provide a way to set content expiration.

You get these benefits and a few more like automatic invalidation through monitors when using System.Runtime.Caching.MemoryCache .

Why it’s (sometimes) not good enough? 

Under certain circumstances the cost of populating the cache is very expensive (data provided by 3rd party systems via webservices or the result of a heavy calculation process, etc…).

In such a scenario whenever the cache is invalidated (by automatic expiration or human invalidation) your application is blocked while getting fresh data. I have some web applications in production for which 3rd parties take more than two minutes to provide me with their data! 

How to improve that situation?

At initial population of the cache there is not much to do. Data must be retrieved at whatever cost. The best solution you can offer is to load all your cached data at startup of the application, in parallel threads if possible.

But then at runtime when the cache is invalidated why not repopulate your cached data in a background thread and then hide the cost of data retrieval to the end-users?

A slightly different pattern for data caching

Standard cache mechanism requires the developer to populate it and then it auto-invalidates itself after content expiration. With the subsequent approach the cache mechanism will auto-populate as well as auto-invalidate.

From a coder's perspective when using that pattern you will be entitled to provide a method to retrieve the data and its lifetime. In other words the mechanism translates into an abstract class with the following abstract methods:

C#
protected abstract T GetData();// where T is the type of data you want to cache
protected virtual TimeSpan GetLifetime();

You will then access the cached data through a static property with the following signature:

C#
public static T Data { get ;}

and you can force invalidation with the following static method:

C#
public static void Invalidate();

Read this carefully 

I am going to introduce you to an implementation for that pattern but please read this important note on the intrinsic limitation of the pattern:

You might not get “fresh” data out of the cache right after cache invalidation! 

The reason is simple: the pattern states that the data takes a long time to retrieve and we are going to get it in the background to prevent UI blockings.

If you make a call to the Invalidate() method then the GetData() method is called asynchronously in the background. Meaning that while the data is repopulating your only option is to serve the “old” data as long as you don’t get the fresh one.

That said there are a few things you have to take care of when providing an implementation for the pattern:

  1. When the cache is populated for the first time: all client threads must be blocked as long as the data is not fully there yet.
  2. To prevent blockings: when the cached data is expired or invalidated the cache should still serve the old content as long as a fresh one is not available. 
  3. You must make sure that one and only one thread is launched to refresh data in the background.

A possible implementation for that pattern 

This one is the most basic you could implement, it stores data in memory, and each time the data is requested it checks if the lifetime has expired. If the cache expiration is set to 10 minutes and you have no call for one hour then the data will remain in the cache for one hour and 10 minutes minimum.

It is left to the reader to provide a different implementation which might deal with timers or anything else.

This one serves my purpose very well as I don’t cache data that is barely used (do you?)...

C#
public abstract class Cache<U,T>
    where U : Cache<U,T>, new()
    where T : class
{

    protected abstract T GetData();
    protected virtual TimeSpan GetLifetime() { return TimeSpan.FromMinutes(10); }

    protected Cache() { }

    enum State
    {
        Empty,
        OnLine,
        Expired,
        Refreshing
    }

    static U Instance = new U();
    static T InMemoryData { get; set; }
    static volatile State CurrentState = State.Empty;
    static volatile object StateLock = new object();
    static volatile object DataLock = new object();
    static DateTime RefreshedOn = DateTime.MinValue;

    public static T Data
    {
        get
        {
            switch (CurrentState)
            {
                case State.OnLine: // Simple check on time spent in cache vs lifetime
                    var timeSpentInCache = (DateTime.UtcNow - RefreshedOn);
                    if (timeSpentInCache > Instance.GetLifetime())
                    {
                        lock (StateLock)
                        {
                            if (CurrentState == State.OnLine) CurrentState = State.Expired;
                        }
                    }
                    break;

                case State.Empty: // Initial load : blocking to all callers
                    lock (DataLock)
                    {
                        lock (StateLock)
                        {
                            if (CurrentState == State.Empty)
                            {
                                InMemoryData = Instance.GetData(); // actually retrieve data from inheritor
                                RefreshedOn = DateTime.UtcNow;
                                CurrentState = State.OnLine;
                            }
                        }
                    }
                    break;

                case State.Expired: // The first thread getting here launches an asynchronous refresh
                    lock (StateLock)
                    {
                        if (CurrentState == State.Expired)
                        {
                            CurrentState = State.Refreshing;
                            Task.Factory.StartNew(() => Refresh());
                        }
                    }
                    break;

            }

            lock (DataLock)
            {
                if (InMemoryData != null) return InMemoryData;
            }

            return Data;
        }
    }

    static void Refresh()
    {
        if (CurrentState == State.Refreshing)
        {
            var dt = Instance.GetData(); // actually retrieve data from inheritor
            lock (StateLock)
            {
                lock (DataLock)
                {
                    RefreshedOn = DateTime.UtcNow;
                    CurrentState = State.OnLine;
                    InMemoryData = dt;
                }
            }
        }
    }

    public static void Invalidate()
    {
        lock (StateLock)
        {
            RefreshedOn = DateTime.MinValue;
            CurrentState = State.Expired;
        }
    }
}

Example of usage

A cache holding a very expensive list of strings which takes three seconds to be built.

C#
public class MyExpensiveListOfStrings : Cache<MyExpensiveListOfStrings, List<string>>
{
    protected override List<string> GetData()
    {
        System.Diagnostics.Trace.WriteLine("Getting fresh data...");
        
        // Make it a really expensive list of string...
        Thread.Sleep(3000);
        
        List<string> result = new List<string>();
        for (int i = 0; i < 10000; i++)
        {
            result.Add("Data  - " + i.ToString());
        }

        return result;
    }

    protected override TimeSpan GetLifetime()
    {
        // Refreshed asynchronously when has spent more than 30 seconds in memory
        return TimeSpan.FromSeconds(30);
    }
}

Example of program using the cache 

The program will create 10 threads accessing the cached list of strings.

All threads will be blocked the first time the cache is populated. 

Then the cache will refresh in the background every 30 seconds or will be invalidated when you press the space bar. No threads will be blocked anymore.

C#
class Program
{
    static void Main(string[] args)
    {
       for (int i =0;i<10; i++) 
       {
            var t = Task.Factory.StartNew(() =>
            {
                while (true)
                {
                    System.Diagnostics.Trace.WriteLine("Looping " + 
                      Thread.CurrentThread.ManagedThreadId + " -> " + 
                      MyExpensiveListOfStrings.Data.Count);
                    Thread.Sleep(50);
                }
            });
        }

        ConsoleKeyInfo key = Console.ReadKey();
        while (key.Key == ConsoleKey.Spacebar)
        {
            MyExpensiveListOfStrings.Invalidate();
            key = Console.ReadKey();
        }
    }
}

A matter of point of view 

I know it sounds weird to build a program which operates with data which is not the most up to date...

Now take a look at it from an helicopter view and you'll see that it is the end-user who is working towards slightly outdated data, the program does not care... Push your view a bit further: whatever the way you cache data if the user needs the most up to date data and the data takes two minutes to load then your user will have to wait two minutes.

Now the question becomes: Is there any good reason to block all threads that rely on cached data in your application while one user who needs the latest version is waiting for it?

If your answer is:

  • undoubtedly yes! then forget about this article Smile | <img src=
  • it depends on the kind of data! then I hope you have read something of interest here. 

Guys, your feedback is most welcome.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)