Much Ado About NULL

Giovanni Scerra

5.00/5 (15 votes)

Jan 18, 2015

CPOL

11 min read

24715

Patterns to prevent null reference exceptions

Introduction

Nulls can be real troublemakers. Dreaded is the bug report that comes with a null pointer error message, and hours are usually wasted to figure out the following two basic facts, in chronological order:

What is the pointer/reference causing the exception
Where the null is coming from and why it was generated

While we want to protect our code from this nasty error, we also want to avoid polluting it with frequent null checks or paranoid error handling. We do not want to be too defensive and add unnecessary verbosity and noise to what should be a straightforward logic and making the code hard to read.

There are many elegant techniques to prevent null reference errors from being generated or from spreading uncontrolled in the code. All these techniques are useful, but none is really so exhaustive and powerful that can solve properly the issue in all cases. Since nulls can have different meanings, this article aims to describe the different cases and suggest which technique is more effective in each specific case.

Caveat Emptor: This article is focused on coding patterns; hence I will not discuss topics related to testing. Unit and exploratory testing are extremely effective ways to spot null reference errors. The techniques listed in this article are absolutely not meant as an alternative to testing.

Five Types of Null

To prevent null reference exceptions, we need to first examine where nulls are coming from in our applications. We can roughly identify five distinct types of nulls, described as follows:

1. Nulls that indicate a failure

Null values can be used to indicate that an operation failed, usually because we detect that something is invalid and it is preventing the method from carrying on. It can be argued that throwing an exception may be a more appropriate way to handle these cases, yet exceptions are quite expensive objects to create in many OO languages, they disrupt the regular flow of the code and in some cases we may legitimately want to avoid them if reasonably possible. If we do not want to throw an exception, we could then return a null to let the caller know that something went wrong.

2. Nulls that indicate the end of the road

Within the context of composite data structures, a null value can signify the end of the navigation path. Such is the case of the last element of a linked list, or the leaf nodes of a tree structure that may have null pointers to the missing children. Database entities may also have navigation properties set to null by the ORM framework when there are no foreign key records connected.

3. Nulls that indicate unsuccessful lookup/search

This is a very common scenario when we look for a value in a dictionary, a cache provider or a simple search of an element in a collection. We obtain a null value because we were not able to find what we were looking for. This case is very similar to the first case, with only one important difference that we will examine later and that justifies putting them in their own separate group.

4. Nulls that indicate unspecified inputs or missing values

When using patterns such as MVC or MVVM, we have view models that are more or less mapped to UI controls. For instance, we may have a date picker mapped to a Date-Time property. If no date has been picked on the UI, we may then expect a null value in the related instance field.
Nulls can frequently come from missing values in the database (database nulls) that are reflected in mapped property of data entities.

5. Accidental/Unexpected Nulls

These are the nulls that do not convey any particular meaning and are ultimately considered as programming mistakes. Sometimes, we forget to initialize or set a reference variable or a reference property of a newly created object. Some other times, we are unable to initialize a variable because in order to create an instance, we need to pass some parameters in the constructor that are not yet available to us. Either way, if an uninitialized reference is in scope, somebody may end up using it accidentally and trigger the infamous exception.
Unexpected nulls can be received as method/constructor parameters that by design should always be valid instances; nulls can also come from a third party method call, a conversion/cast, reflection on a type that should exist, deserialization operations, etc.

Patterns for Preventing Null Reference Exceptions

Here, we will examine some well-known techniques to handle nulls in our applications.
To exemplify, I am sometimes using a few lines of C# code, which I believe to be simple enough that can be easily understood by any developer who has used OO languages before.

Null Object Pattern

This strategy consists of substituting null values with objects that have a default behavior. For instance, we can have search method that searches within a collection for a person with a certain id:

Person p = People.FindByUniqueId(34);

If a person with id 34 does not exists in the collection, we can return a default instance of a person (e. g., Person.Nobody), which represent a fully functional Person instance but with default/blank properties and behavior. While theoretically we can use this technique to replace nulls, its practical applicability has some severe limitations:

Can only be used where it makes sense to have a default state/behavior
When a null has a specific meaning, using a placeholder can hide that meaning (e.g., a failure). In the above example, the Null Object Pattern would hide the fact that the search was unsuccessful. Because of this reason, the pattern is mostly effective to prevent null types 4 and 5 by providing a default initialization option
The usage of the pattern should be obvious. In the example above, it is not obvious that the method returns a Null Object, so a developer that did not read the documentation may end up checking for nulls anyway.

Tester-Doer Pattern

A method can potentially return a null value (doer) can be logically paired to another method (tester) that would first verify the conditions to avoid the null value. A typical example is the ContainsKey/GetValue combination for a dictionary:

If (peopleDictionary.ContainsKey(34)) {

     Person p = peopleDictionary.GetValue(34); //Always successful(?)
}

This approach applies to nulls of type 2 and 3 but has two major flaws:

It is hard to make it thread-safe since the tester and doer are two separate operations
The technique lacks of cohesion: the pattern does not enforce testing before doing, while logically the two operations should always go together

Try Methods

In its basic form, a try operation is a method that returns a Boolean value and also has an output instance that should be used only if the method returns true. If applied to the prior example:

Person p = Person.Nobody; //Avoid null initialization, assuming a default behavior makes sense

if (People.TryFindByUniqueId(34, out p) {

     //p is the person with Id 34
} else {

     //nobody has id 34, no need to check p
}

The name and signature of the try method immediately conveys the meaning that the operation could be unsuccessful and its usage is very intuitive. There are a couple of variants of try methods that may prove very useful: TryOrDefault and TryOrRetrieve.
The TryOrDefault just has an extra parameter to allow the user to explicitly specify a default value instead of returning false

Person p = People.TryFindByUniqueIdOrDefault(34, Person.Nobody);

The TryOrRetrieve is very similar, but instead of letting the user specify a default value, allows the user to specify a function that returns the value. This pattern is commonly used in cache retrieval operations:

Person p = cache.TryFindOrLoad (34, ()=> {return db.GetPersonPrDefault(34,Person.Nobody)});

Try methods are well suited for preventing nulls of type 2 and 3, but are frequently inadequate to prevent a null that indicates a failure (type 1). The next technique will better address this scenario.

Result Object Pattern

Returning null to indicate a failure is not only dangerous but it has a fundamental flaw: it does not tell us anything about the reason of the failure. When an operation is complex, there may be many reasons for failure. Let us consider the example:

Person p = Person.FromJSON("{id: 1, name: 'John Doe', age: 46}");

If the method returns a null, it may be because the age is invalid (e.g., a negative number), or the name is empty, or the all JSON string is null. To allow the caller to know more about the cause of the failure, we may design a more complex object: an operation Result Object.

PersonFromJSONResult result = Person.FromJSON("{id: 1, name: '', age: 46}");

If (result.Success) {
     //use result.Person
} else {
      //Check result.FailureReason to know more
}

Result objects require some work and are well suited to replace nulls of type 1 for complex operations such us validation or business rule engines, server-client communication (e.g. web service results), non-trivial conversions/parsing, etc.

Maybe Monad

This technique comes from functional programming and can be seen as broader and more generic alternative of try methods.
A Maybe Monad is but a special object that can contain an instance or not value at all. In languages supporting generics (e. g., Java, C#), it would look like:

Maybe<Person> maybePerson = people.FindByUniqueId(34);

To extract the value from the Maybe object, we can do something like:

Person p = maybePerson.ValueOrDefault(Person.Nobody);

We can also provide a tester method for convenience:

If (maybePerson.HasValue) { //extract value from maybePerson }

The advantage of this approach is that we can provide a generic reusable implementation of the Maybe<T> object and use it wherever applicable without implementing try methods.
This approach can be effectively applied to null of type 2, 3, 4 and 5.
Check out the documentation of your language to see if it natively provides an implementation of the Maybe Monad.

Safe Navigation Operators

Many modern OO languages offer a safe navigation operator (aka null propagation operator) to avoid null reference exceptions while navigating through properties of entities (null type 4). The operator is syntactic sugar to avoid ugly nested null checking while evaluating expression like this one:

var customerZipCode = db.GetCustomerById(5).ContactInfo.Address.ZipCode;

While navigating the data in the entity, we may encounter a null reference that would throw an exception. If we use the safe navigation operator, if at any point we encounter a null value, the evaluation of the expression would result in a null value:

var customerZipCode = db.GetCustomerById(5)?.ContactInfo?.Address?.ZipCode;

Obviously when using this operator, we must then expect that customerZipCode may be null in the end and we would still need to check before using its value. Another important caveat is that the way this operator works may be a little tricky in some languages (see references).

Empty Collections

Notably, when dealing with collections instead of single instances, a fit solution is frequently as simple as using empty collections to replace null values. Empty collections are excellent replacements for nulls of type 2, 3, 4 and 5 whenever applicable. For this reason, methods that return single instances can sometimes be “pluralized” to work instead with collections:

int[] ids = new int[]{34, 3, 88};
Person[] persons = people.FindByUniqueIds(ids); //It returns an empty array if no id has been found

Null Guard Clauses

There are cases where replacing nulls is just not possible. When nulls violate a design contract (preconditions, post-conditions and invariants), the common approach is to protect the code by using null guard clauses. Essentially null guard clauses are but 'if' statements that throw exceptions when they find offending null references. The exception should be thrown as early as possible and the exception messages should indicate precisely the null reference variable that was detected.
It may seem odd to prevent a null reference exception by throwing another exception. Yet, it still makes perfect sense if we consider the following two facts:

Null reference exceptions are hard to troubleshoot while guard clauses exceptions detect and pinpoint the problem as early as possible and with precision
Null guard clauses are used in those cases where nulls cannot be replaced and are not acceptable by design. Not throwing exceptions will inevitably break the functionality, possibly in unpredictable ways

public  void ScorePeople(IScoringStrategy strategy) {

   //Null guard clause
   if (strategy==null)
     throw new ArgumentNullException("Cannot score people with a null strategy!")

   //Some scoring strategy here
}

Static Analyzers

There are tools out there that can help us detecting potential null references in the code through static analysis.
I have personally tried only one of them for .NET (Code Contracts) and found it quite interesting and full of potential.
Static code analyzers are outside of the scope of this article. If you want to know more, following is a list of some free static analysis tools to identify null reference problems in different languages:

Language	Tool
.NET	Code Contract http://research.microsoft.com/en-us/projects/contracts/
Java, Scala, Groovy	Find Bugs http://findbugs.sourceforge.net/
C, C++	CppCheck http://sourceforge.net/projects/cppcheck/
C, Objective C	Clang Static Analyzer http://clang-analyzer.llvm.org/
Php	Phantm https://github.com/colder/phantm

Cheat Sheet

Not a big fan of cheat sheets, but they can still be useful for reference purposes:

Technique	Effective with null types	Notes
Null Object Pattern	4, 5	Only when default behavior makes sense
Tester-Doer Pattern	2, 3	Thread safety issues, lack of cohesion
Try Methods	2, 3, 4
Result Object Pattern	1	When you need extra metadata on the result of an invocation
Maybe Monad	2,3,4,5
Safe Navigation Operators	2,4	Good for navigation properties
Empty Collections	2,3,4,5	Collections only
Null Guard Clauses	5	When nulls cannot be replaced
Static Analyzers	5	Identify places where guard clauses are needed

References

Null Object Pattern
http://en.wikipedia.org/wiki/Null_Object_pattern
Tester Doer Pattern
http://msdn.microsoft.com/en-us/library/ms229009%28v=vs.110%29.aspx
Tester Doer and Try-X Pattern
http://marchoeijmans.blogspot.com/2012/12/test-doer-and-try-parse-pattern.html
Result Object Pattern
http://c2.com/cgi/wiki?ResultObjectPattern
Maybe Monad
http://en.wikipedia.org/wiki/Monad_%28functional_programming%29
C# 6.0 – Null Propagation Operator
http://davefancher.com/2014/08/14/c-6-0-null-propagation-operator/
Groovy's Safe Navigation Operator Not as Safe as I Thought https://www.altamiracorp.com/blog/employee-posts/groovys-safe-navigation-operat
Guard Clauses
http://c2.com/cgi/wiki?GuardClause
Design By Contract
http://en.wikipedia.org/wiki/Design_by_contract
Contracts for Java (cofoja)
https://github.com/nhatminhle/cofoja
Code Contracts
http://msdn.microsoft.com/en-us/library/dd264808%28v=vs.110%29.aspx