Using JSON Based Entities for Caching Remote Data in C#

honey the codewitch

4.21/5 (6 votes)

Sep 7, 2019

CPOL

25 min read

16000

104

A walkthrough of building a JSON based transparent caching entity framework with branch reuse that intelligently minimizes the amount of server requests, against the use case scenario of accessing TMDb's API

Introduction

Aside from a clean way to query remote JSON/REST based services, you usually need a way to cache and index the data you get back. This is especially important for webservices who, due to their nature, and the latency + statelessness of connected services, not to mention request limiting, they tend to return "chunky" data - that is, data in large chunks. In this case, JSON based systems will often return multiple nested cascades of data in a single query. There are several complications in handling this properly and efficiently, some of which we'll address in this article.

Understanding the Mess

I'll be showing the REST call URLs as well as the pretty printed results of those as we go through. Let's look at the data for a show. It's chunky, as I said.

Here's getting some TV show information for "Burn Notice" from themoviedb.org's API:

"invoke:" https://api.themoviedb.org/3/tv/2919?api_key=c83a68923b7fe1d18733e8776bba59bb

{
   "id": 2919,
   "backdrop_path": "/lgTB0XOd4UFixecZgwWrsR69AxY.jpg",
   "created_by": [
         {
            "id": 1233032,
            "credit_id": "525749f819c29531db09b231",
            "name": "Matt Nix",
            "gender": 2,
            "profile_path": "/qvfbD7kc7nU3RklhFZDx9owIyrY.jpg"
         }
      ],
   "episode_run_time": [
         45
      ],
   "first_air_date": "2007-06-28",
   "genres": [
         {
            "id": 10759,
            "name": "Action & Adventure"
         },
         {
            "id": 18,
            "name": "Drama"
         }
      ],
   "homepage": "http://usanetwork.com/burnnotice",
   "in_production": false,
   "languages": [
         "en"
      ],
   ...

It goes on like that. Look at nodes like created_by - they are subobjects. Further down (omitted here), there are whole seasons! (If you click the link I provided, you'll get all of this.)

The takeaway here is you need a way to store this data and preserve the heirarchy. It's already JSON so your work is a lot further along if you keep it in some sort of representation of a JSON format when you're holding it around.

This may not seem so bad but this chunkiness becomes more difficult when it contains overlapping data. For example, I may have performed the query to get the result above, and then I want to get all the information TMDb has on "Matt Nix", the man in the created_by field. Well, I can do that, but I don't necessarily have to because some of the information is already in there as you can see:

{
   "id": 1233032,
   "credit_id": "525749f819c29531db09b231",
   "name": "Matt Nix",
   "gender": 2,
   "profile_path": "/qvfbD7kc7nU3RklhFZDx9owIyrY.jpg"
}

That's a fair bit of information. Maybe it's everything we need, maybe it's not. What if I want his IMDb id? What if I want his birthday? We have to go to the server for that. So we make another request, using the id from above

...

"invoke:" https://api.themoviedb.org/3/person/1233032?api_key=c83a68923b7fe1d18733e8776bba59bb

{
   "id": 1233032,
   "credit_id": "525749f819c29531db09b231",
   "name": "Matt Nix",
   "gender": 2,
   "profile_path": "/qvfbD7kc7nU3RklhFZDx9owIyrY.jpg",
   "birthday": "1971-09-04",
   "known_for_department": "Writing",
   "also_known_as": [],
   "biography": "",
   "popularity": 0.742,
   "adult": false,
   "imdb_id": "nm0633180"
}

This is the same data, just with more in it. This is a typical pattern of online queryable JSON repositories. This is great, it's just, what we do with it now? Well, we obviously want to merge it in with the data we already have in the created by field, right? Hmm, something like that, but no. We'll get into that later. The upshot is the same anyway, but we get clever with it. Essentially, though, we need to be able to store and query this information on our end so we don't have to go back to the server every time, and we need a way to intelligently put off going to the server until we actually need the data we want. In other words, if we already have some of our data cached, we don't want to go to the server again for it, but if we don't we need to transparently fetch it and put it in the cache on demand.

Obviously, a second issue is addressing, both remotely and locally. How did we know to use "id" from above to run the second query? How do we know exactly where to store the information in our local repository so we can get to it later quickly?

So basically, we have to confront the problems of both storage and addressing. I'm proposing a simple solution using JSON itself to do most of the heavy lifting.

I'll be once again revisiting the TMDb and Json codebases I published here, with the links provided at the top.

Coding This Mess

Storing This Mess

The first issue mentioned was storage so we'll start there. We use IDictionary<string,object> to hold our JSON {} objects. The reason for this is so we have indexes for faster lookup. Meanwhile we have to use object for our values because they can be one of any number of types that map to JSON - an IList<object> for a JSON [] array, numeric types that map to JSON, and of course string, bool and null.

We use the included the ambitiously named "Json" library to turn JSON text into this, and back, and to support querying it. We'll also use it to perform the remote RPC/REST calls (in this case, to TMDb). All that heavy lifting in a small little package. Woo. You can swap this out with NewtonSoft's offering or something else if you like, but be careful because I've orchestrated this around dictionary classes. If the 3rd party doesn't use those your job just got a lot harder.

First, we need a single dictionary to root all of our data. All of our entities will have dictionaries somewhere underneath this one. Unless you want a messy constructor for your entities, and complications to the code, you'll want to keep some static state around. So each entity needs to know how to find the root and the mechanism we use involves keeping static state around. This is not thread safe so what we do is use static ThreadLocal<IDictionary<string,object>> to hold our data. This means the data is on a per-thread basis. This has upsides like no locking required, and added simplicity, and downsides like cache misses (we'll get to this) and increased memory usage in multithreaded apps since you have to keep one data store for each thread. The former issue of cache missing is mitigated if you use some sort of secondary caching mechanism, like this little Json library can. The latter issue of memory usage is mitigated in ASP.NET web server environments because the pages are short running and keep alive connections tend to be served on the same thread(s) request to request, meaning you'll usually have access to the cache from prior requests instead of creating a new one on each serve.

public static class Tmdb
{
    const string _apiUrlBase = "https://api.themoviedb.org/3";
    static ThreadLocal<IDictionary<string, object>> 
           _json=new ThreadLocal<IDictionary<string, object>>(()=>new JsonObject());
    public static IDictionary<string,object> Json { get { return _json.Value; } }
    public static string ApiKey { get; set; }
    public static string Language { get; set; }
}

Above JsonObject is just a thin wrapper provided by our "Json" library that wraps a Dictionary<string,object> class. It's nothing special, although it has some features like value semantics that a standard dictionary doesn't share. We don't use those here. You can make this a Dictionary<string,object> is you want.

You'll notice aside from the static ThreadLocal<IDictionary<string,object> _json field we have some other fields here as well. The reason is the TMDb service, like most, requires an "API key". You have to provide this with every call into the service so you can set it at the application level. It doesn't change by user. Language is another parameter that TMDb accepts globally and in this case doesn't vary by user, although if you're making a multilanguage web app you'll want to make sure your _json is structured to account for this per language basis as well. For example, I'd do it by putting more dictionaries/JSON objects underneath the root, so you have a language at the root of the store/cache like "$.en-US.movie.219" aka "/en-US/movie/219" (everything goes under its Iso 639.1 code or something). Don't worry about it here. The solution presented is flexible enough to accommodate it, it just takes pre-planning.

Note that we have a Json property on this object that retrieves our root dictionary. All of our entities will as well. Each one points to the JSON for its own data. The root class is static, and holds everything - all the other objects which are just dictionaries exist somewhere in the overall graph/tree. I hope that makes sense.

This makes addressing simpler. Speaking of:

Addressing This Mess

Any time you keep data around, you have to have a way to get it back again. Files have filenames and paths, relational databases have primary keys. What do we have? We have indexers into objects, since they're all dictionaries and lists.

...
object o;
// get the "created_by" field from the show's JSON
if (showData.TryGetValue("created_by", out o))
{
    // make sure it's a list.
    var l = o as IList<object>;
    if (null != l)
    {
        if(0 < l.Count)
        {
           // get the dictionary for the person
           var personData = l[0] as IDictionary<string, object>;
           if(null != personData)
           {
               // now get their name and write it
               string name=null;
               if (personData.TryGetValue("name", out o))
                   name = o as string;
               Console.WriteLine(name);
           }
        }                
    }
}

This is how to move through the tree in an efficient manner, using the indexed fields. However, it's not easy on the fingers. It begs for a wrapper. Again the "Json" library to the rescue:

// basically works like this should: showData["created_by"][0]["name"]
Console.WriteLine(JsonObject.Get(showData, "created_by", 0,"name") as string);

This will get you the exact same thing.

Or you can use the more familiar, but less efficient Select() mechanism to run a JSON path query

Console.WriteLine(JsonObject.Select(showData, "$.created_by[0].name").First() as string);

All roads lead to "Matt Nix" (in this case)

Good. Using some form of the above we can point back to our elements. I prefer the second mechanism, using Get() because in the end, it's actually the easiest and most suitable for this, and pretty efficient.

Now, just like a row in a relational database has an id, and a file has its path and name, we need something to uniquely identify our objects, and a way to do so that lets us get to them again - addressing once again, but we need to store our address somehow. We can't store what we wrote above as is.

Instead, we'll end up keeping our object "identity" like this as path segments in a string[] array to make it easy to pass to Get(), or to string.Join() on '.' into a JSON path or with '/' to give us a path - a trick we'll use with TMDb to make it much easier to synchronize the local data with the remote endpoint.

Because TMDb also uses paths to expose its data, we'll simply use pretty much the same paths in our local storage. For example, the TV show "Burn Notice" (TMDb id of 2919) is at /tv/2919 and the movie "Volver" (TMDb 219) is at /movie/219

Virtually speaking, the JSON path (or rather, one of them) to get to the name "Matt Nix" from the root of TMDb is "$.tv.2919.created_by[0].name"

and the root of the show itself is:

"<code>$.tv.2919</code>"

or, using our method, new string[] { "tv","2919" }

JsonObject.Get(Tmdb.Json,"tv",2919");

We'll call this string array for our path PathIdentity and many entities will have one, but we'll get into that later. Just remember that the above one is for this show, "Burn Notice". Let's continue.

We'll be using CreatePath() instead of Get() for fetching our objects because if we don't have them we want to create them. So in this case, putting it together - here's what the entities essentially do upon creation

this.Json = JsonObject.CreatePath(Tmdb.Json,this.PathIdentity);

The above not only creates the object and any objects leading up to it, it assigns its own storage pointer to a node in the Tmdb.Json but we'll get to this. The important bit for now is that PathIdentity is instrumental in locating our object locally as well as remotely.

Below is the actual URL at the remote DB for the show:

https://api.themoviedb.org/3/tv/2919?api_key=c83a68923b7fe1d18733e8776bba59bb

I've included the API key so you can test it. The root of all the API is https://api.themoviedb.org/3.

Locating Objects With PathIdentity

Now, when we fetch data, basically we're going to put it under Tmdb.Json at it's same "address" as indicated by PathIdentity for that object. In this case, for "Burn Notice" it's /tv/2919

We can "create the address" in the root simply by calling JsonObject.CreatePath(Tmdb.Json,show.PathIdentity) which will also return us the innermost node it just created. That basically just created the path /tv/2919 ("$.tv.2919") because that is what the show returns that from PathIdentity as above. As mentioned, this will also return us the node at 2919. Keep in mind this method isn't destructive - if the path already exists it just navigates it - it doesn't destroy anything.

Internally, we're basically just creating dictionaries nested in one another.

Remember that this PathIdentity is also the address at the remote server. If it wasn't we'd have to have an additional property for that, but this makes it super simple. Remember how the URL for "Burn Notice" (2919) above has this path identity in it? Yeah. You can see where this is headed. With this PathIdentity we know enough to fetch the data we don't already have.

Also, not all of our entities can have a PathIdentity. Remember that the service returns deeply nested data, so some data is only available as part of a parent query. This data does not have its own path. It does not fetch itself from the server. It essentially represents part of its parent.

Putting this Mess Together: The JSON Backed Entity Base Classes

First, the base most class:

// Represents a basic entity in the TmdbApi library
// This object uses a custom form of reference semantics
// for equality comparison - it's Json property is compared.
public abstract class TmdbEntity : IEquatable<TmdbEntity>
{
    protected TmdbEntity(IDictionary<string, object> json)
    {
        Json = json ?? throw new ArgumentNullException(nameof(json));
    }
    public IDictionary<string, object> Json { get; protected set; }
    protected T GetField<T>(string name,T @default=default(T))
    {
        object o;
        if (Json.TryGetValue(name, out o) && o is T)
            return (T)o;
        return @default;
    }
    // objects are considered equal if they
    // point to the same actual json reference
    public bool Equals(TmdbEntity rhs)
    {
        if (ReferenceEquals(this, rhs))
            return true;
        if (ReferenceEquals(rhs, null))
            return false;
        return ReferenceEquals(Json, rhs.Json);
    }
    public override bool Equals(object obj)
    {
        return Equals(obj as TmdbEntity);
    }
    public static bool operator==(TmdbEntity lhs, TmdbEntity rhs)
    {
        if (object.ReferenceEquals(lhs, rhs)) return true;
        if (object.ReferenceEquals(lhs, null)) return false;
        return lhs.Equals(rhs);
    }
    public static bool operator!=(TmdbEntity lhs, TmdbEntity rhs)
    {
        if (object.ReferenceEquals(lhs, rhs)) return false;
        if (object.ReferenceEquals(lhs, null)) return true;
        return !lhs.Equals(rhs);
    }
    public override int GetHashCode()
    {
        var jo = Json as JsonObject; // should always be but it doesn't *have* to be
        if(null!=jo)
        {
            // we don't want our wrapper's hashcode since
            // JsonObject implements value semantics
            // So get the "real" dictionary and 
            // GetHashCode() on that. 
            return jo.BaseDictionary.GetHashCode();
        }
        return Json.GetHashCode();
    }
}

From top to bottom:

The first thing we have is the constructor, which takes a JSON object (JsonObject or IDictionary<string,object>)

Our class will use this information to fill its fields. This is essentially the initial starting JSON of the class. Sometimes, such JSON will contain a solitary field - just enough information to pull the rest from the server. Often

{ "id": 2919 }

or similar. The above points to the id for "Burn Notice", but it will require a server call to retrieve anything else.

The next thing we have is the required Json property. This just holds/returns the JsonObject that keeps your state.

The third thing is GetField<T>() which takes a name and optionally, a default value. All it does is essentially try to return the value as the specified type, and if it can't, it returns the specified default. This is just a helper method for derived classes. It's not critical by itself, but it has a big brother, GetCachedField<T>() in a derived class which does more, so using GetField<T>() and GetCachedField<T>() in tandem just makes things more consistent.

The rest deal with implementing our equality comparison semantics. Basically, what we want is for two objects to be the same if each of their respective Json properties refer to the same dictionary - the same memory location. This is kind of a one off requirement, so implementing it is a bit weird in .NET - it's usually default behavior, but we don't want our entities compared for reference equality - we want our JSON object we're holding at Json to be the arbitor of that. This is so our objects are considered equal if they both point to the same location in memory, under the Tmdb.Json root. This is part of how we do that. The other step was alluded to above but we haven't got to explore it just yet. We will.

The weirdness in GetHashCode() is necessary because we don't want value semantics in this class, so we're overriding the behavior of JsonObject, but there's no direct way to do that from outside the class. Also the object doesn't have to be a JsonObject, it can be any dictionary, so we have to accept either one and check for it.

public class TmdbImage : TmdbEntity
{
    public TmdbImage(IDictionary<string,object> json) : base(json) {}
        
    public int Width => GetField("width",0);
    public int Height => GetField("height", 0);
    public double AspectRatio => GetField("aspect_ratio", 0);
    public string Path => GetField<string>("file_path");
    public string Language => GetField<string>("iso_639_1");
    public double VoteAverage => GetField("vote_average",0d);
    public int VoteCount => GetField("vote_count", 0);
    public TmdbImageType ImageType {
        get {
            switch(GetField<string>("image_type"))
            {
                case "poster":
                    return TmdbImageType.Poster;
                case "backdrop":
                    return TmdbImageType.Backdrop;
                case "logo":
                    return TmdbImageType.Logo;
            }
            return TmdbImageType.Unknown;
        }
    }
    // only present for logo images
    public string FileType => GetField<string>("file_type");
}

This backs a JSON object that looks basically like this example:

(This one doesn't have an URL I can give you because they are only returned as part of a subquery)

{
     "aspect_ratio": 0.666666666666667,
     "file_path": "/lYqC8Amj4owX05xQg5Yo7uUHgah.jpg",
     "height": 3000,
     "iso_639_1": null,
     "vote_average": 0,
     "vote_count": 0,
     "width": 2000
}

The code in the derived classes are regular enough to be generated, or one could probably use attributes and reflection to make this more automatic.

Note: You might be wondering why we're not using expando objects, or other automatic wrapping facilities. I've considered it, but you have to know how to demand load when a field isn't present - that itself is solvable by wiring up events to the underlying dictionary classes' accessors, but you also have to know how to get your own remote and local addresses from the JSON that represents you, and that isn't so easy because JSON does not have schema information. What fields are your keys? How do you build the path from them? You could use JSON schema to provide this, but then you'd have to declare a schema and that's as arduous as declaring a wrapper and the code to make it actually do what you want is way more complicated. All roads lead to coding the entire API, one way or another - or at least the fields you need. This is true whether you're writing the "code" as JSON schema, or the way we're doing it. Doing it this way is the simplest way to solve all of the above issues at once. In any case, if you want it, the Json properties on all the entities already support expando access via "dynamic" in c# because of the way JsonObject works. The JSON fields become accessor properties on the object as you'd expect.

The other option is not using entities at all and simply going at the JSON in uncooked form, but that comes with a number of disadvantageous, but with some compelling upsides. The tree/graph rooting and path identity concepts are useful even if you forgo entities, but again you'll have to come up with another mechanism for demand loading and addressing, which we're about to cover. Consider the derived class, TmdbCachedEntity:

public abstract class TmdbCachedEntity : TmdbEntity
{
    protected TmdbCachedEntity(IDictionary<string, object> json) : base(json)
    {
    }
    public abstract string[] PathIdentity { get; }
    // overload this in a derived class and when called, get your JSON from the remote source.
    // if you don't do it, it will be done for you using PathIdentity
    protected virtual void Fetch()
    {
        // in case you forget to override and the
        // API doesn't accept a language argument 
        // all this does is send an extra parameter
        FetchJsonLang();
    }
    // helper method to fetch remote data from a TMDb path and merge it with our data.
    protected void FetchJson(string path = null, 
    Func<object, object> fixupResponse = null, Func<object, object> fixupError = null)
    {
        var json = Tmdb.Invoke(path ?? 
        string.Join("/", PathIdentity), null, null, fixupResponse, fixupError);
        JsonObject.CopyTo(json, Json);
    }
    // helper method to fetch remote data from a TMDb path and merge it with our data.
    // sends the current language
    protected void FetchJsonLang(string path = null, 
    Func<object, object> fixupResponse = null, Func<object, object> fixupError = null)
    {
        var json = Tmdb.InvokeLang(path ?? 
        string.Join("/", PathIdentity), null, null, fixupResponse, fixupError);
        JsonObject.CopyTo(json, Json);
    }
    // demand loads if a field is not present.
    protected T GetCachedField<T>(string name, T @default = default(T))
    {
        object o;
        if (Json.TryGetValue(name, out o) && o is T)
            return (T)o;
        Fetch();
        if (Json.TryGetValue(name, out o) && o is T)
            return (T)o;
        return @default;
    }
    // Call this method in your entity's constructor to root it in 
    // the in memory cache. This is important.
    protected void InitializeCache()
    {
        var path = PathIdentity;
        if (null != path)
        {
            var json = JsonObject.CreatePath(Tmdb.Json, path);
            JsonObject.CopyTo(Json, json);
            Json = json;
        } else
            throw new Exception("Error in entity implementation. PathIdentity was not set.");
    }
}

From top to bottom:

First, we have the constructor overload that passes our JSON data onto the base class. In your derived most constructor you are expected to call InitializeCache() which we'll cover, but it can only be done after PathIdentity is created.

Which brings us to the PathIdentity - we must create this in a derived class so our object can locate itself. We explored it earlier.

Next we have Fetch() which tells our derived class that we're to fetch from the server. Normally, the base class can handle this just fine but you might want to overload it. That's why there are FetchXXXX() helper methods, which we're about to get to.

We have FetchJson() which is a helper that makes a REST "RPC call" using PathIdentity to the remote endpoint and we have FetchJsonLang() which is exactly the same thing except it sends the &language parameter along with the query string. Each call itself is delegated to the Tmdb.Invoke() or Tmdb.InvokeLang() as appropriate. This is because some calls accept language, and other calls do not. You can safely send the language parameter if it's not accepted, but it's probably best not to. Both of those routines delegate to JsonRpc.Invoke() but handle the connection rate limiting and specialized errors returned by TMDb's service.

Next we have GetCachedField<T>() which takes a field name and an optional default value and returns the field. If the field is not present, or not the correct type (possibly the local store was altered?) it fetches from the server by calling Fetch() and then tries to get the value again, only finally returning null if the fetch didn't get anything. This is a naive way to handle it, as it can result in perpetual fetches when values are never present, but since JsonRpc.Invoke() supports second level caching, you can just use that to mitigate the issue - which isn't necessarily a huge issue to begin with. That's why there's not a more sophisticated null handling scheme (such as inserting DBNull fields or something). Anyway, this works from the end use standpoint the same way GetField<T>() does, except obviously it can lag if it has to fetch.

Finally, we have InitializeCache(), whose job it is to "root" our object under Tmdb.Json somewhere.

It performs the following steps:

Create or navigate to the specified path in the JSON. This creates as it goes, yielding the final node we created in the path. We pass our PathIdentity here which creates a new IDictionary<string,object>/JsonObject under Tmdb.Json at the path indicated.
Copy any current state we're holding into the new node we just created or the node we just navigated to (from #1)
Replace our pointer for our own state with the "pointer" (reference) we created or navigated to from step #1.

The last step is magic, as it not only roots us in the tree, but it allows us to recycle branches so we're not duplicating (as much) state. More importantly, branches like this that are recycled are merged together through this process so you always have one place to get the most complete state for any cached item, no matter how many logical copies exist in the overlapping data you receive. It works like a symbolic link in Linux or Windows filesystems. Another way to look at it is you are "mount" your own state in the root tree somewhere. Like a POSIX filesystem. Another way to think about it is you're turning your tree into a graph because one node can have more than one parent. It's a simple trick with some big wins associated with it.

Let's take a look at a pared down derivation of a TmdbCachedEntity with a complex (multipart) key and the ability to do secondary fetching (using additional fetch methods aside from the main one to fetch associated data).

// represents a TV episode
public sealed class TmdbEpisode : TmdbCachedEntity
{
    public TmdbEpisode(int showId, int seasonNumber,int episodeNumber) : 
        base(_CreateJson(showId, seasonNumber,episodeNumber))
    {
        InitializeCache();
    }
    public TmdbEpisode(IDictionary<string, object> json) : base(json)
    {
        InitializeCache();
    }
    static IDictionary<string, object> _CreateJson
           (int showId, int seasonNumber, int episodeNumber)
    {
        var result = new JsonObject();
        // add our "key fields" to the json
        result.Add("show_id", showId);
        result.Add("season_number", seasonNumber);
        result.Add("episode_number", episodeNumber);
        return result;
    }
    // our path needs to look like this:
    // /tv/{show_id}/season/{season_number}/episode/{episode_number}
    public override string[] PathIdentity
        => new string[] {
            "tv",
            GetField("show_id", -1).ToString(),
            "season",
            GetField("season_number", -1).ToString(),
            "episode",
            GetField("episode_number", -1).ToString(),
        };
    public TmdbShow Show {
        get {
            int showId = GetField("show_id", -1);
            if (-1 < showId)
                return new TmdbShow(showId);
            return null;
        }
    }
    public TmdbSeason Season {
        get {
            int showId = GetField("show_id", -1);
            if (-1 < showId)
            {
                int seasonNum = GetField("season_number", -1);
                if (-1 < seasonNum)
                    return new TmdbSeason(showId,seasonNum);
            }
            return null;
        }
    }
    public int Number => GetField("episode_number", -1);
        
    public string Name => GetCachedField<string>("name");
        
    public DateTime AirDate => Tmdb.DateToDateTime(GetCachedField<string>("air_date"));
    
    public TmdbCrewMember[] Crew
        => JsonArray.ToArray(
            GetCachedField<IList<object>>("crew"),
            (d)=>new TmdbCrewMember((IDictionary<string,object>)d));
    
    public TmdbCastMember[] GuestStars
        => JsonArray.ToArray(
            GetCachedField<IList<object>>("guest_stars"),
            (d) => new TmdbCastMember((IDictionary<string, object>)d));
    
    public string ImdbId {
        get {
            _EnsureFetchedExternalIds();
            var d = GetField<IDictionary<string, object>>("external_ids");
            if (null != d)
            {
                object o;
                if (d.TryGetValue("imdb_id", out o))
                    return o as string;
            }
            return null;
        }
    }
    
    public string TvdbId {
        get {
            _EnsureFetchedExternalIds();
            var d = GetField<IDictionary<string, object>>("external_ids");
            if (null != d)
            {
                object o;
                if (d.TryGetValue("tvdb_id", out o))
                    return o as string;
            }
            return null;
        }
    }
    
    // TODO: figure out what this means and make an enum possibly
    public string ProductionCode => GetCachedField<string>("production_code");
    
    public string StillPath => GetCachedField<string>("still_path");
    
    public TmdbCastMember[] Cast {
        get {
            _EnsureFetchedCredits();
            var credits = GetField("credits", (IDictionary<string, object>)null);
            if (null != credits)
            {
                object o;
                if (credits.TryGetValue("cast", out o))
                {
                    var l = o as IList<object>;
                    return JsonArray.ToArray(l, 
                     (d) => new TmdbCastMember((IDictionary<string, object>)d));
                }
            }
            return null;
        }
    }
    
    void _EnsureFetchedCredits()
    {
        var credits = GetField<IList<object>>("credits");
        if (null != credits) return;
        var json = Tmdb.Invoke(string.Concat
        ("/", string.Join("/", PathIdentity), "/credits"));
        if (null != json)
            Json["credits"] = json;
    }
    
    void _EnsureFetchedExternalIds()
    {
        var l = GetField<IList<object>>("external_ids");
        if (null == l)
        {
            var json = Tmdb.InvokeLang(string.Concat
            ("/", string.Join("/", PathIdentity), "/external_ids"));
            if (null != json)
                Json.Add("external_ids", json);
        }
    }
    ...
}

Okay, admittedly even paired down a little, that's a lot to take in. We'll start at the top, and generally go top to bottom, but some jumping around might be in order this time to make things a little clearer.

First, we have the familiar constructor that initializes from some JSON data. One noteable difference is it's calling InitializeCache(), which we went over above. This roots the object in the cache, and it is important for all objects that are directly cached to call this method in the constructor. This sets our Json property to the right place, and makes sure we have all the available data we need.

We have a second constructor that takes several integers, a show id, a season number, and an episode number.

If this were in a relational database, these three items would comprise the primary key. In this paradigm, these are the minimum amount of information needed for this object to fetch the rest of itself from the remote store.

The initial JSON would look like this if we ran the following code:

// burn notice pilot episode
var episode = new TmdbEpisode(2919, 1, 1);
Console.WriteLine(episode.Json);

{
   "show_id": 2919,
   "season_number": 1,
   "episode_number": 1
}

This composes a PathIdentity of /tv/2919/season/1/episode/1

for a final request URL of:

https://api.themoviedb.org/3/tv/2919/season/1/episode/1?api_key=c83a68923b7fe1d18733e8776bba59bb

This is the URL FetchJson() and FetchJsonLang() use to satisfy their request for more episode data - because our path identity is what it is. Fetch() will handle this automatically via GetCachedField<T>().

Note that we call GetField<T>() instead of GetCachedField<T>() for these values that make up the path identity, like Number. You cannot fetch identity fields from the remote source because you need them to complete the fetch, but how this resolves is a stack overflow so don't do it. Always use GetField<T>() for these fields. We'll never use the caching version to retrieve show_id, season_number, or episode_number

A small note about the naming: Number and index are not the same. Air orders can be different and seasons can have specials at index zero, but might not, so season indexes might not match season numbers. Ergo, seasons have a Number property as well.

Note how in the Show and Season properties, we simply create an instance of the relevant class and pass it the id(s). Because of the way InitializeCache() works, the show and season objects can locate themselves in the local store/cache, which means they will immediately have access to any data already there. In most cases, by the time you've retrieved the episode, you've already retrieved the show and season, so these values are usually cached, so even though these wrappers only had the id(s) they probably got pointed to and merged with the rest of the data as soon as they were instantiated. In the case where they haven't got the complete set, any time something is asked for that hasn't been fetched, a fetch happens, and it gets the rest, and copies it back into the store automatically.

The next three fields are boring. They just directly wrap the underlying JSON, but note how GetField<T>() vs GetCachedField<T>() is used; Number is part of our PathIdentity so we must never attempt to fetch it. Finally, the AirDate field gets a string in "yyyy-MM-dd" format and converts it to a DateTime using a helper method.

Next things get interesting.

public TmdbCrewMember[] Crew
        => JsonArray.ToArray(
            GetCachedField<IList<object>>("crew"),
            (d)=>new TmdbCrewMember((IDictionary<string,object>)d));

This gets a JSON array at the field "crew" then passes it to JsonArray.ToArray<T>() giving it two arguments:

The first is the JSON array we just received from GetCachedField<...>("crew"), the second is a lambda expression both taking and returning System.Object. Basically it takes one object of data and does what it needs to create an object of type T from it. Each object comes from the JSON array so it could be a dictionary, a list, or some scalar JSON value. Remember that each of our entities takes an IDictionary<string,object> constructor argument? Well, here we're passing each element of the JSON array to the constructor for TmdbCrewMember which creates an instance of that type to pass along to fill the destination array element.

The next property GuestStars does the same thing but for TmdbCastMember from "guest_stars".

Now we get to the ImbdId property:

public string ImdbId {
    get {
        _EnsureFetchedExternalIds();
        var d = GetField<IDictionary<string, object>>("external_ids");
        if (null != d)
        {
            object o;
            if (d.TryGetValue("imdb_id", out o))
                return o as string;
        }
        return null;
    }
}

The first thing to note is it's calling _EnsureFetchedExternalIds() which is because this data (if it's not already present) must be retrieved as a separate call into TMDb:

https://api.themoviedb.org/3/tv/2919/season/1/episode/1/external_ids?api_key=c83a68923b7fe1d18733e8776bba59bb

{
   "id": 223655,
   "imdb_id": null,
   "freebase_mid": "/m/02vxx4g",
   "freebase_id": null,
   "tvdb_id": 330913,
   "tvrage_id": 574476
}

The result is then stored in the "external_ids" field of the episode's JSON.

This is accomplished by the following routine:

void _EnsureFetchedExternalIds()
{
    var l = GetField<IList<object>>("external_ids");
    if (null == l)
    {
        var json = Tmdb.InvokeLang(string.Concat
                   ("/", string.Join("/", PathIdentity), "/external_ids"));
        if (null != json)
            Json.Add("external_ids", json);
    }
}

It simply returns the field or otherwise fetches it from the URL. Note what it's doing to our PathIdentity: It's prepending "/" to it, and then joining it on "/" and then adding "external_ids" as the suffix. It then invokes delegating to Tmdb.Invoke() and stores the result in the json under the "external_ids" field. Note that this makes our local store's virtual path to it the same as the remote server's actual path to it. So once again, we exploited the simplicity of TMDb's API addressing here.

Anyway, after ensuring the data was fetched, ImdbId navigates the JSON (I did it manually here because this was older code) and returns the result of the imdb_id field.

The same exact thing happens with the TvdbId property and the respective tvdb_id field.

The next properties are ProductionCode and StillPath which get the production_code and still_path fields, respectively.

Now we're at Cast - one of our properties that returns an array, but this one also uses a separate query to get its data:

public TmdbCastMember[] Cast {
    get {
        _EnsureFetchedCredits();
        var credits = GetField("credits", (IDictionary<string, object>)null);
        if (null != credits)
        {
            object o;
            if (credits.TryGetValue("cast", out o))
            {
                var l = o as IList<object>;
                return JsonArray.ToArray(l, 
                       (d) => new TmdbCastMember((IDictionary<string, object>)d));
            }
        }
        return null;
    }
}

The separate query is handled by a separate routine, with this URL:

https://api.themoviedb.org/3/tv/2919/season/1/episode/1/credits?api_key=c83a68923b7fe1d18733e8776bba59bb

which gets us this data:

{
   "cast": [
         {
            "character": "Madeline Westen",
            "credit_id": "525749f519c29531db09b018",
            "gender": 1,
            "id": 73177,
            "name": "Sharon Gless",
            "order": 2,
            "profile_path": "/ul7dTg6MxIU72inhxXiMWEJH8MP.jpg"
         },
         {
            "character": "Michael Westen",
            "credit_id": "525749f519c29531db09b04c",
            "gender": 2,
            "id": 52886,
            "name": "Jeffrey Donovan",
            "order": 0,
            "profile_path": "/5i47zZDpnAjLBtQdlqhg5AIYCuT.jpg"
         },
         {
            "character": "Fiona Glenanne",
            "credit_id": "525749f519c29531db09afe4",
            "gender": 1,
            "id": 5503,
            "name": "Gabrielle Anwar",
            "order": 1,
            "profile_path": "/khnEDczzSy6UcbnqZ6Sb4lWxnkE.jpg"
         },
         {
            "character": "Sam Axe",
            "credit_id": "525749f519c29531db09b080",
            "gender": 2,
            "id": 11357,
            "name": "Bruce Campbell",
            "order": 3,
            "profile_path": "/hZ2fW0gpPIBvXxT5suJzaPZQCz.jpg"
         }
      ],
   "crew": [
         {
            "id": 20833,
            "credit_id": "525749d019c29531db098a72",
            "name": "Jace Alexander",
            "department": "Directing",
            "job": "Director",
            "profile_path": "/nkmQTpXAvsDjA9rt0hxtr1VnByF.jpg"
         },
         {
            "id": 1233032,
            "credit_id": "525749d019c29531db098a46",
            "name": "Matt Nix",
            "department": "Writing",
            "job": "Writer",
            "profile_path": null
         }
      ],
   "guest_stars": [
         {
            "id": 6719,
            "name": "Ray Wise",
            "credit_id": "525749cc19c29531db098912",
            "character": "",
            "order": 0,
            "profile_path": "/z1EXC8gYfFddC010e9YK5kI5NKC.jpg"
         },
         {
            "id": 92866,
            "name": "China Chow",
            "credit_id": "525749cc19c29531db098942",
            "character": "",
            "order": 1,
            "profile_path": "/kUsfftCYQ7PoFL74wUNwwhPgxYK.jpg"
         },
         {
            "id": 17194,
            "name": "Chance Kelly",
            "credit_id": "525749cc19c29531db09896c",
            "character": "",
            "order": 2,
            "profile_path": "/hUfIviyweiBZk4JKoCIKyuo6HGH.jpg"
         },
         {
            "id": 95796,
            "name": "Dan Martin",
            "credit_id": "525749cd19c29531db098996",
            "character": "",
            "order": 3,
            "profile_path": "/u24mFuqwEE7kguXK32SS1UzIQzJ.jpg"
         },
         {
            "id": 173269,
            "name": "Dimitri Diatchenko",
            "credit_id": "525749cd19c29531db0989c0",
            "character": "",
            "order": 4,
            "profile_path": "/vPScVMpccnmNQSsvYhdwGcReblD.jpg"
         },
         {
            "id": 22821,
            "name": "David Zayas",
            "credit_id": "525749cd19c29531db0989ea",
            "character": "",
            "order": 5,
            "profile_path": "/eglTZ63x2lu9I2LiDmeyPxhgwc8.jpg"
         },
         {
            "id": 1233031,
            "name": "Nick Simmons",
            "credit_id": "525749cf19c29531db098a17",
            "character": "",
            "order": 6,
            "profile_path": "/xsc2u2QQA6Nu7SvUYUPKFlGl9fw.jpg"
         }
      ],
   "id": 223655
}

As you can see, it has a crew and a cast field. We store the whole thing under credits which once again, synchronizes our heirarchy with the remote repository's because our paths match up. As I mentioned well above, if your remote store can't be mirrored in this way, you'll have to have both remote and local identities for your entities. We can avoid that here by exploiting the layout of the TMDb API - it's easy to mirror which is what we're doing, in parts, but we have to make sure our local and remote paths to our items always match.

Anyway, here's the routine, which works almost exactly like the last one we encountered except it's slightly simpler:

void _EnsureFetchedCredits()
{
    var credits = GetField<IList<object>>("credits");
    if (null != credits) return;
    var json = Tmdb.Invoke(string.Concat("/", string.Join("/", PathIdentity), "/credits"));
    if (null != json)
        Json["credits"] = json;
}

Understanding the Json Layout

Effectively, we've been mirroring the addresses at the remote repository locally as we go, fetching on demand if data isn't already present. However, what also might not be so clear is that we've also been recycling branches.

That is to say, you can get to "Matt Nix" from many ways by querying the JSON data, but we've minimized the number of duplicate branches by fixing all of them up to point to his "person" node at /person/1233032 - this is because of the way cached entities fix up their cache in InitializeCache() once a node is already found, it is reused. That makes our data storage and retrieval far more efficient, as we maximize cache hits and minimize duplication at the same time.

Therefore, our tree isn't a tree anymore, it's a graph, as in one node can have more than one parent, unlike a tree.

You have to be very careful how you do this or you'll create endlessly recursive graphs which will fail on attempts to serialize JSON. Fortunately, because we're using paths and only branch recycling with our cached entities this should never happen. Also note that you can't "see" the branch recycling by serializing the JSON. As the JSON gets written, any recycled branches will be written out at every location. This isn't the best thing, and not really what we want, but it's not a show stopper.

There is probably enough above for you to take it from here as far as the entities themselves go. Although it does help to play around with it and tinker with the code.

Points of Interest

Throughout the code, we've been calling variations of Tmdb.Invoke() to handle sending the actual JSON/REST call. What it does is handle appending the API key to the query string and its Lang variations also append the language to the query string. These functions and their variations handle various aspects of querying the API like calls with a page parameter, and also handling request throttling when we go over our limit. They all call down to _Invoke(), which establishes the actual call using JsonRpc.Invoke()

static object _Invoke(string path, 
    bool sendLang, 
    IDictionary<string, object> args, 
    IDictionary<string, object> payload, 
    Func<object, object> fixupResult, 
    Func<object, object> fixupError, 
    string httpMethod)
{
    var url = _apiUrlBase;
    if (!path.StartsWith("/"))
        url = string.Concat(url, "/");
    url = string.Concat(url, path);
    if (null == args)
        args = new JsonObject();
    args["api_key"] = ApiKey;
    if (sendLang && !string.IsNullOrEmpty(Language))
        args["language"] = Language;
    object result = null;
    var retryCount = 0;
    while (null == result)
    {
        ++retryCount;
        try
        {
            var s = JsonRpc.GetInvocationUrl(url, args);
            System.Diagnostics.Debug.WriteLine("Requesting from " + s);
            result = JsonRpc.Invoke(s, null/*we already computed the url*/, 
                     payload, null, httpMethod, fixupResult, fixupError, Tmdb.CacheLevel);
            if (null == result)
                break;
        }
        catch (JsonRpcException rex)
        {
            if (retryCount > 11)
            {
                rex.Json.Add("retry_count_exceeded:", retryCount - 1);
                throw;
            }
                    
            // are we over the request limit?
            if (25 == rex.ErrorCode)
            {
                System.Diagnostics.Debug.WriteLine(rex.Message + ".. throttling " + url);
                // wait and try again
                Thread.Sleep(RequestThrottleDelay);
            }
            else if (-39 == rex.ErrorCode)
                continue;//malformed or empty json, try again
            else
                throw;
        }
    }
    return result;
}

Such is the unreliable nature of the web we have to deal with things like invalid responses and throttling, so this routine above handles all that for us. Sometimes, it means a serious lag in the worst case, but with multilevel caching, this isn't a big issue.

Using This Mess In A Web Server Environment

So this is interesting: Apparently Keep-Alive connections may be served by the same thread request to request. This isn't always true, but it's true enough to be beneficial to us because of the way our per-thread Tmdb.Json instance works. Essentially, we're allowed to keep it as (unreliable) connection state. This has been called a "sinister hack" but all it really is a circumstantial optimization. It means that page to page, we'll keep getting cache hits for a single user session that's backed by this API. This is very good, as it means we have less instances of the JSON being created to satisfy the user's request, and we're hitting the remote repository less. Failing that, we rely heavily on the secondary "per url" caching. In a web server environment it's best to set Tmdb.CacheLevel to JsonRpcCacheLevel.Aggressive in order to make that effective.

History

Saturday, 7^th September, 2019 - Initial submission