Principles of API Design: The One, The Many, The Null, and The Nothing

Josh Fischer

4.72/5 (58 votes)

16 Sep 2013CPOL10 min read

80.7K

How to create an API that will scale as a system grows over time.

Overview

When designing an API, there are countless factors that must be considered. Security, consistency, state management, style; the list seems never ending. One factor that often goes overlooked, however, is scale. By designing your APIs with scale in mind from the beginning, hundreds of hours of development time can be saved as the system grows.

Introduction

The definition of an Application Programming Interface (API) can, at times, be difficult to determine. Technically speaking, any function that is called by another programmer's code could fit the definition. Debating which code qualifies as an API is beyond the scope of this article so, for our purposes, we will assume that basic functions qualify.

The examples in this article are intentionally kept simple to illustrate the main point. C# functions are used, but the core principles can apply to almost any language, framework, or system. The data structures in the examples are modeled based on the familiar relation style used by many industry standard databases. Again, this is for illustrative purposes only and should not be viewed as a requirement for applying the principles.

The Requirements

Let's assume we are creating a basic order processing system for a client and that the three main classes (or "data structures" if you prefer) are already defined. Below we have a very basic relational class structure. The Customer class has a "foreign key" (borrowing database terminology) to Address and the Order class has foreign keys to Address and Customer. You are asked to create a library that can be used to process Orders. The first business rule to implement is that the State of the Customer's HomeAddress must be the same as the State of the Order's BillingAddress (don't ask why, business rules rarely make any sense). ;-)

public class Address
{
    public int AddressId { get; set; }
    public string Street { get; set; }
    public string City { get; set; }
    public string State { get; set; }
    public string Zipcode { get; set; }
}

public class Customer
{
    public Address HomeAddress { get; set; }
    public int CustomerId { get; set; }
    public int HomeAddressId { get; set; }
    public string CustomerName { get; set; }
}

public class Order
{
    public Customer MainCustomer { get; set; }
    public Address ShippingAddress { get; set; }
    public Address BillingAddress { get; set; }

    public int OrderId { get; set; }
    public int CustomerId { get; set; }
    public int ShippingAddressId { get; set; }
    public int BillingAddressId { get; set; }
    public decimal OrderAmount { get; set; }
    public DateTime OrderDate { get; set; }
}

The Implementation

Checking to see if two fields match is certainly an easy task. Hoping to impress your boss, you whip out the solution in less than ten minutes. The VerifyStatesMatch function returns a boolean that will indicate to the caller whether the business rule is being followed or not. You run some basic tests on your library and you determine that the code only takes, on average, 50 ms to execute and does not have any flaws. The boss is very impressed and gives your library to the other developers to use in their applications.

public bool VerifyStatesMatch(Order order)
{
    bool retVal = false;
    try
    {
        // Assume this operation takes 25 ms.
        Customer customer = SomeDataSource.GetCustomer(order.CustomerId);
        // Assume this operation takes 25 ms.
        Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
        retVal = customer.HomeAddress.State == shippingAddress.State;
    }
    catch (Exception ex)
    {
        SomeLogger.LogError(ex);
    }
    return retVal;
}

The Problem

The next day you come into work and there is a sticky note on your monitor: "Come see ASAP - The Boss". You figure that you did such a great job on your library yesterday, your boss must have an even harder task for you today. You soon find out, however, that there are some serious problems with your code.

You: Hi boss, what's up?

Boss: Your library is causing all kinds of problems in the software!

You: What? How?

Boss: Bob says your algorithm is too slow, John said it's not working properly, and Steve said something about "object reference not set to an instance of object".

You: I don't understand, I tested it yesterday and everything was fine.

Boss: I don't want excuses. Go talk to the other guys and figure it out!

Not the way you wanted to start your day right? I would be surprised if most developers haven't been faced with this kind of situation before. You thought you had coded your library "perfectly", but yet there appears to be all kinds of problems. By applying the principles of The One, The Many, The Null, and The Nothing you will be able to see where the API fails to meet the expectations of others.

The One

http://en.wikipedia.org/wiki/The_Matrix

The first principle to follow is to handle properly handle "The One". By The One, I mean that your API should process one instance of the expected input without any errors that you do not explicitly tell callers may occur. You might be thinking: "Isn't that obvious?", but let's look at our example and show how we might not be properly handling one Order.

Customer customer = SomeDataSource.GetCustomer(order.CustomerId);
Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
// What if customer.HomeAddress didn't load properly or was null?     
retVal = customer.HomeAddress.State == shippingAddress.State;

As the comment above states, we assumed that the HomeAddress property loaded properly from the data source. Although 99.99% of the time it probably will, a bullet proof API must account for the one in a million scenario where it won't. Also, depending on the language, the comparison of the two State properties my fail if either property did not load properly. The point here is that you cannot make any assumptions about the input you are given or about that data you get from code that you do not control.

This is the easiest principle to understand so let's fix our example move on.

Customer customer = SomeDataSource.GetCustomer(order.CustomerId);
Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
if(customer.HomeAddress != null)
{
    retVal = customer.HomeAddress.State == shippingAddress.State;
}

The Many

http://msdn.microsoft.com/en-us/library/w5zay9db.aspx

Getting back to our scenario above, we need to talk to Bob. Bob said the code was too slow, but 50 ms is well within the accepted execution time given the architecture of the system. Well, it turns out that Bob has to process 100 Orders for your largest customer in a batch, so the call to your method is taking 5 seconds total in the loop he is using.

Bobs code:
foreach(Order order in bobsOrders)
{
    ...
    bool success = OrderProcess.VerifyStatesMatch(order);
    ....
}

You: Bob, why do you think my code is too slow? It only takes 50 ms to process an order.

Bob: Customer Acme Inc. demands the fastest performance possible for their batch orders. I need to process 100 orders so 5 seconds is too slow.

You: O, I didn't know we needed to process orders in batches.

Bob: Well, it's only for Acme because they are our largest customer.

You: O, I wasn't told anything about Acme or batch orders.

Bob: Well, shouldn't your code be able to efficiently handle the processing of more than one order at a time?

You: O....yeah, of course.

It's very obvious what happened and why Bob thinks the code is "too slow". You were not told about Acme and no one said anything about batch processing. Bob's loop is loading the same Customer and, most likely, the same Address record 100 times. This issue can easily be fixed by accepting an array of Orders instead of just one and by adding some simple caching. The C# params keyword was designed for situations just like this.

 public bool VerifyStatesMatch(params Order[] orders)
{
    bool retVal = false;
    try
    {
        var customerMap = new Dictionary<int, Customer>();
        var addressMap = new Dictionary<int, Address>();
        foreach (Orderorder in orders)
        {
            Customer customer = null;
            if(customerMap.ContainsKey(order.CustomerId))
            {
               customer = customerMap[order.CustomerId];
            }
            else
            {
               customer = SomeDataSource.GetCustomer(order.CustomerId);
               customerMap.Add(order.CustomerId, customer);
            }
            Address shippingAddress = null;
            if(addressMap.ContainsKey(order.ShippingAddressId))
            {
               shippingAddress = addressMap[order.ShippingAddressId];
            }
            else
            {
               shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
               addressMap.Add(order.ShippingAddressId,shippingAddress);
            }
            retVal = customer.HomeAddress.State == shippingAddress.State;
            if(!retVal)
            {
                break;
            }
        }
    }
    catch (Exception ex)
    {
       SomeLogger.LogError(ex);
    }
    return retVal; 
}

This version of the function will greatly speed up Bob's batch processing. Most of the data calls have been eliminated because we can simply look up the record by ID in the temporary cache (Dictionary).

Once you have opened your API up to The Many, you must now put in some range checks. What if, for example, someone send one million orders into your method? Is a number that large outside of the scope of the architecture? This is where understanding both the system architecture and business processes pays off. If you know that the maximum use case for processing orders is 10,000, you can with confidence add a check for, say, 50,000 records. This will ensure that someone doesn't accidentally bog down the system with a large, invalid, call.

While these are not the only optimizations that can be made, it hopefully illustrates how planning for "The Many" in the beginning can save rework later.

The Null

http://en.wikipedia.org/wiki/Null_pointer#Null_pointer

You: Steve, are you passing null into my code?

Steve: I'm not sure, why?

You: The boss said you were getting "object ref..." errors.

Steve: O, that must be from the legacy system. I don't control the output from that system, we just pipe it into the new system as is.

You: That seems silly, why don't we do something about these nulls?

Steve: I do; I check for null in my code; don't you?

You: O....yeah, of course.

"Object reference not set to an instance of an object." Do I even need to explain that error? For many of us, it has cost us many hours of our lives. In most languages, null, the empty set, etc is a perfectly valid state for any non-value type. This means that any solid API must account for "The Null" even if it is technically "wrong" for a caller to pass it.

Of course, checking every reference for null can become very time consuming and is probably overkill. However, you should never trust input coming from a source you do not control so we must check our "orders" parameter, as well as the Orders inside of it, for null.

public bool VerifyStatesMatch(params Order[] orders)
{
    bool retVal = false;
    try
    {
        if (orders != null)
        {
            var customerMap = new Dictionary<int, Customer>();
            var addressMap = new Dictionary<int, Address>();
            foreach (Order order in orders)
            {
                if (order != null)
                {
                    Customer customer = null;
                    if (customerMap.ContainsKey(order.CustomerId))
                    {
                        customer = customerMap[order.CustomerId];
                    }
                    else
                    {
                        customer = SomeDataSource.GetCustomer(order.CustomerId);
                        customerMap.Add(order.CustomerId, customer);
                    }

                    Address shippingAddress = null;
                    if (addressMap.ContainsKey(order.ShippingAddressId))
                    {
                        shippingAddress = addressMap[order.ShippingAddressId];
                    }
                    else
                    {
                        shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
                        addressMap.Add(order.ShippingAddressId, shippingAddress);
                    }
                    retVal = customer.HomeAddress.State == shippingAddress.State;

                    if (!retVal)
                    {
                        break;
                    }
                }
            }
        }
    }
    catch (Exception ex)
    {
        SomeLogger.LogError(ex);
    }
    return retVal;
}

By diligently checking for null, you can avoid the embarrassing support calls from customers asking what an "instance of an object" is. I always err on the side of caution; I would rather have my function return the default value and log a message (or send an alert) than throw the somewhat useless null reference error. Of course, this decision is completely dependent on the type of system, whether the code is running in a client or server, etc. The lesson here is that you can only ignore null for so long, before it will bite you.

UPDATE: To be clear, I am not advocating that a function should "do nothing" when an invalid state has been encountered. If null parameters are not acceptable in your system, throw an exception (like ArgumentNull in .NET). However, there are some situations where returning a meaningful default is perfectly acceptable and throwing an exception is not necessary. Fluent methods, for example, will typically return the value that was passed into them if they cannot act on the value. There are far too many factors to make any kind of blanket statements about what should be done when a null is encountered.

The Nothing

http://youtu.be/CrG-lsrXKRM

You: John, what are you passing into my code? It looks like an incomplete Order.

John: O, sorry about that. I don't really need to use your method, but one of the other libraries required me to pass an Order parameter. I guess that library is calling your code. I don't work with orders, but I have to use that other library.

You: That other library needs to stop doing that; that is bad design.

John: Well, that library has evolved organically as the business has changed. Also, it was written by Matt who is out this week; I'm not really sure how to change it. Shouldn't your code be checking for bad input anyway?

You: O....yeah, of course.

Of the four principles, "The Nothing" is probably the most difficult to describe. Despite meaning "nothing" or "empty", null actually has a definition and can be quantified. Heck, most languages have a built in keyword for it; null certainly is not nothing. By handling "nothing", I mean that your API must handle input that is essentially garbage. In our example, this would translate into handling an Order that does not have a CustomerId or that has an OrderDate from 500 years ago. A better example would be a collection that does not have any items in it. The collection is not null and should fall into the "The Many" category, but the caller failed to populate the collection with any data. You must always be sure to handle this "nothing" scenario. Let's adjust our example to ensure that callers can't just pass something like looks like an Order; it must fulfill the minimum universal requirements. Otherwise, we will just treat it as nothing.

...
// Yes, I cheated.  ;-) 
if (order != null && order.IsValid)
...

Conclusion

If there is one point that I hope this article has demonstrated, it is that any code taking input is never "perfect" and the implementation of any function or API must take into account how the API is going to be used. In our example, our 12 line function grew to 50 lines without any changes to the core functionality. All the code we added was to handle the scale, or range, of data that we would accept and process properly and efficiently.

The amount of data being stored has grown exponentially over years so the scale of input data will only increase while the quality of the data has no place to go but down. Properly coding an API in the beginning, can make a huge difference in winning business, scaling as customers grow, and reducing future maintenance costs (and headaches for you!).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)