Generic Algorithms on Runtime Types in C++ Through Type Erasure

Satprem Pamudurthy

4.87/5 (23 votes)

Oct 3, 2017

CPOL

11 min read

27097

This article describes a C++ technique called type erasure and shows that it can be used to write generic algorithms on runtime types. It then examines the relationship between type erasure and other forms of polymorphism through the notion of type compatibility.

Abstract

C++ supports different kinds of polymorphism. While inheritance is the foundation of object-oriented programming, templates are the foundation of generic programming. Templates in C++ are instantiated at compile-time and thus cannot be used when the type arguments are known only at runtime. There are, however, situations where we need to apply an algorithm identically to values whose types are known only at runtime. In this article, we will look at a technique called type erasure and see that it allows us to write generic algorithms on runtime types. We will then examine the relationship between type erasure and other forms of polymorphism through the notion of type compatibility, and see how it can help us choose between inheritance and type erasure for runtime polymorphism.

Runtime Polymorphism Using Inheritance

Runtime polymorphism involves selecting an implementation based on the runtime types of one or more arguments. In C++, this is implemented through the use of inheritance and virtual functions. Derived classes can extend or modify the base class’s behavior by overriding virtual functions. Imagine we wanted to define classes for representing various kinds of animals. While each animal is different, there are certain behaviors that apply to all animals. For example, all animals make a sound but each animal makes a different sound. Inheritance allows us to define a common interface for animals while letting the derived classes define the sound each animal makes. Consider the following example.

class IAnimal {
    
public:
    IAnimal() {}
    virtual ~IAnimal() {}
    
    virtual void makeSound() const = 0;
};

class Dog : public IAnimal {
    
public:
    Dog() {}
    virtual ~Dog() override {}
    
    virtual void makeSound() const override
    {
        std::cout << "woof" << std::endl;
    }
        
    std::string toString() const
    {
        return "Dog";
    }
};

class Cat : public IAnimal {
    
public:
    Cat() {}
    virtual ~Cat() override {}
    
    virtual void makeSound() const override
    {
        std::cout << "meow" << std::endl;
    }
};

std::unique_ptr<IAnimal> getAnimal(int which)
{
    switch (which)
    {
        case 0:
            return std::make_unique<Dog>();
            
        case 1:
            return std::make_unique<Cat>();
            
        default:
            throw std::runtime_error("Unknown animal type");
    }
}
        
int main()
{
    int which = -1;
    std::cin >> which;
    std::unique_ptr<IAnimal> animal = getAnimal(which);
    animal->makeSound();
    return 0;
}

The type of animal is not known at compile-time – it depends on user input at runtime. The call to makeSound() is dispatched to the appropriate concrete implementation based on the runtime type of the object. With inheritance, defining a new derived class requires no changes to the base class. Thus, inheritance allows us to create abstractions that are open for extension and closed for modification (Open/Closed Principle). However, it requires explicit declaration of base class – derived class relationships. What if we need to apply an algorithm to types that are not part of an inheritance hierarchy? Let’s make things interesting by adding another class into the mix.

class Robot {
    
public:
    std::string toString() const
    {
        return "Robot";
    }
};

Let’s say we wanted to write an algorithm that calls toString() on its argument and prints its string representation. The Dog and the Robot classes both have a toString() method but they don’t share the same base class. We could slap on an IHasToString interface on them but some of these types might be in a third-party library and thus cannot be modified. Also, forcing a set of unrelated types to derive from a common base class does not necessarily express an is-a relation and possibly violates the Liskov Substitution Principle.

So, given that there are situations where inheritance might not be the right choice for runtime polymorphism, is there an alternative to shoehorning our components into an inheritance hierarchy?

Does the Adapter Pattern Help?

The adapter pattern is used to create a bridge between an object’s interface and the interface needed by some algorithm. Let’s apply the adapter pattern to our problem and see what we get.

class IHasToString {
  
public:
    virtual ~IHasToString() {}
    
    virtual std::string toString() const = 0;
};
        
class DogAdapter : public IHasToString {
    
public:
    DogAdapter(const Dog& obj)
    : obj_(obj)
    { }
    
    std::string toString() const override
    {
        return obj_.toString();
    }
    
private:
    const Dog& obj_;
};

class RobotAdapter : public IHasToString {
    
public:
    RobotAdapter(const Robot& obj)
    : obj_(obj)
    { }
    
    std::string toString() const override
    {
        return obj_.toString();
    }
    
private:
    const Robot& obj_;
};
        
std::unique_ptr<IHasToString> getObject(int which)
{
    switch (which)
    {
        case 0:
            return std::make_unique<DogAdapter>((Dog()));
            
        case 1:
            return std::make_unique<RobotAdapter>((Robot()));
            
        default:
            throw std::runtime_error("Unknown object type");
    }
}

int main()
{
    int which = -1;
    std::cin >> which;
    std::unique_ptr<IHasToString> object = getObject(which);
    std::cout << object->toString() << std::endl;
    return 0;
}

The good part is we didn’t have to modify the Dog and the Robot classes. However, because they are not part of an inheritance hierarchy, we ended up having to define a separate adapter for each class. It is repetitive, error-prone and does not scale well.

But notice how every one of the adapters has identical operations, only on objects of different types. This is exactly the kind of problem that generic programming is meant to solve. In C++, generic functions are defined using function templates and generic data types are defined using class templates. However, templates are instantiated at compile-time and thus cannot be instantiated with types that are only known at runtime. This is where the type erasure pattern comes in.

Type Erasure

The term type erasure is commonly used in C++ to describe techniques for creating non-intrusive adapters that are implemented in terms of the adapted object's properties instead of its actual type. The method I am going to describe here is value-semantic type erasure and it is implemented by exploiting a feature of C++ templates that allows a non-template class to have a template constructor. It is best understood through an example.

class ThingWithToString {

public:
    template<typename T>
    ThingWithToString(const T& obj)
    : inner_(std::make_unique<Holder<T> >(obj))
    {
    }

    ThingWithToString(const ThingWithToString& that)
    : inner_(that.inner_->clone())
    {
    }

    ThingWithToString& operator=(const ThingWithToString& that)
    {
        if (this != &that) {
            inner_ = that.inner_->clone();
        }
        return *this;
    }

    std::string toString() const
    {
        return inner_->toString();
    }

private:
    struct HolderBase {
        virtual ~HolderBase() { }
        virtual std::string toString() const = 0;
        virtual std::unique_ptr<HolderBase> clone() const = 0;
    };

    template<typename T>
    struct Holder : public HolderBase {
        Holder(const T& obj)
        : obj_(obj)
        {
        }

        std::string toString() const override
        {
            return obj_.toString();
        }

        std::unique_ptr<HolderBase> clone() const override
        {
            return std::make_unique<Holder<T> >(obj_);
        }

        T obj_;
    };

    std::unique_ptr<HolderBase> inner_;
};
        
ThingWithToString getThingWithToString(int which)
{
    switch (which)
    {
        case 0:
            return ThingWithToString(Dog());
            
        case 1:
            return ThingWithToString(Robot());
            
        default:
            throw std::runtime_error("Unknown object type");
    }
}
              
int main()
{
    int which = -1;
    std::cin >> which;
    ThingWithToString object = getThingWithToString(which);
    std::cout << object.toString() << std::endl;
    return 0;
}

The ThingWithToString class itself is not a template but its constructor is. The constructor takes the argument, which is the object to be adapted, and stores it in an instance of Holder<T>. However, because ThingWithToString is not a template, it cannot directly store an instance of Holder<T>. It gets around this by deriving Holder<T> from a non-template base class (HolderBase) and storing a reference to the base class. Once constructed, ThingWithToString knows nothing about the type of the adapted object. In other words, the type of the adapted object is "erased" from its view at compile-time, but it can be accessed at runtime through HolderBase's virtual methods. Because the type-erased interface (ThingWithToString) is not parameterized by the type of the adapted object, we can choose the type to instantiate it with at runtime.

A question some might have at this point is why we didn't use the HolderBase class directly instead of wrapping it inside ThingWithToString. Assuming HolderBase and Holder<T> aren't inner classes, let's see what such an implementation might look like.

std::unique_ptr<HolderBase> getObject(int which)
{
    switch (which)
    {
        case 0:
            return std::make_unique<Holder<Dog> >((Dog()));

        case 1:
            return std::make_unique<Holder<Robot> >((Robot()));

        default:
            throw std::runtime_error("Unknown object type");
    }
}   

int main()
{
    int which = -1;
    std::cin >> which;
    std::unique_ptr<HolderBase> object = getObject(which);
    std::cout << object->toString() << std::endl;
    return 0;
}

We get most of the functionality except for one thing - because HolderBase is an abstract class, it does not have value semantics. ThingWithToString, on the other hand, has value semantics - copying an instance of ThingWithToString also copies the adapted object. Of course, we can still pass those instances by reference or by pointer if necessary. Value-semantic types are desirable because not sharing objects makes it easier to reason about programs (referential transparency). In addition to providing value semantics, using a wrapper around HolderBase allows us to implement small-object optimizations if necessary. For example, Adobe's poly library uses a local buffer to store small objects instead of creating them on the heap.

Now that we've seen the mechanics of value-semantic type erasure, let's think about the kind of objects ThingWithToString can be instantiated within. The object must certainly have a toString() method i.e. model a StringConvertible concept. It must also be copy-constructible because we want value semantics. Beyond that, it does not matter what other methods it has or what its actual type is. This is exactly what templates require of their type arguments – the only difference is that type-erased interfaces can be instantiated at runtime. We can now write generic algorithms on runtime types!

You don't have to look too far to find examples of type-erased interfaces. Both std::any and std::function (and their boost counterparts) are implemented using type erasure. In his article On the Tension Between Object-Oriented and Generic Programming in C++, Thomas Becker describes how type erasure is used to implement the any_iterator.

At first glance, inheritance and type erasure couldn’t be more different. So how do we decide when to use type erasure instead of inheritance? Let’s first make a quick detour and see how type erasure fits into the pantheon of polymorphism. We'll do that by looking at the relationship between the notion of type compatibility and polymorphism (I promise there is a point to this).

Type Compatibility and Polymorphism

Before we begin, consider the following definition of polymorphism from On Understanding Types, Data Abstraction and Polymorphism by Luca Cardelli and Peter Wegner.

Similarity relations among type expressions that permit a type expression to denote more than one type, or to be compatible with many types, are referred to as polymorphism

So what is type compatibility? Type compatibility refers to whether the type of an expression is consistent with the type expected in the context that the expression appears in [Wikipedia-TypeSystem]. Two types that are equivalent (under some rules for equivalence) are compatible with each other. In languages that support subtyping, a subtype is compatible with its super type. Broadly speaking, there are two forms of type compatibility – nominative and structural.

In a nominative type system, two variables are type-compatible if and only if their declarations name the same type (including the definition scope) [Wikipedia-Nominal]. Type aliases, such as those defined using typedef in C++, are compatible with their original types. Super type – subtype relationships are explicitly declared by name. Inheritance is nominative subtyping because the name of the base class is part of the definitions of the derived classes. In a structural type system, type equivalence is determined by the type’s actual structure or definition, and not by other characteristics such as its name or place of declaration [Wikipedia-Structural]. One type is a subtype of another if and only if it contains all the properties of that type. Here, I use the term property to mean a type’s public method or data. While the C++ type system is primarily nominative, it exhibits structural typing with respect to template type parameters.

With this in mind, let’s take a look again at the type erasure example. ThingWithToString, as we saw, can be instantiated with objects of any type that has a toString() method, i.e., any type that is a structural subtype of ThingWithToString. In other words, type erasure allows us to simulate runtime structural subtype polymorphism in C++. In contrast, polymorphism through inheritance is runtime nominative subtype polymorphism. This is a powerful way of thinking about type erasure because it allows us to identify equivalents between type erasure- and inheritance-based designs. For example, std::any can be thought of as the structural equivalent of an empty base class. The std::function template is a generator for structural super types of callable types - its inheritance-based counterpart is an abstract base class with a pure virtual operator()().

The dichotomy between nominative and structural type compatibility also exists in compile-time polymorphism. Function overloading (ad-hoc polymorphism), requires nominative compatibility or implicit convertibility between the types of arguments and parameters. Templates (parametric polymorphism) require structural compatibility between type arguments and type parameters. Just as inheritance is the runtime counterpart of overloading (virtual function overrides are essentially overloads on the type of the implicit this argument), type erasure is the runtime counterpart of templates.

When to Use Type Erasure?

It’s time to bring it all together. Understanding the relationship between type compatibility and polymorphism allows us to phrase the choice between inheritance and type erasure (and between overloading and templates) in terms of the pros and cons of nominative and structural typing. It gives us a framework for making these design choices at both compile-time and runtime.

Structural typing is clearly the more flexible of the two. It frees us from having to anticipate all the algorithms we want to apply to a type, and having to anticipate all the types we want to apply an algorithm to. In contrast to nominative subtyping, we can define a structural super type of an existing type without modifying the definition of that type. Thus, structural typing thus allows us to treat concrete types that are otherwise unrelated by inheritance in a polymorphic manner. However, just because two types are structurally equivalent does not mean that they are semantically equivalent. The names of types convey contracts and invariants that are not apparent from their structures themselves. For example, a well-designed inheritance hierarchy communicates certain guarantees about pre- and post-conditions. Type erasure, on the other hand, offers no such guarantees. Nominative typing also allows the programmer to explicitly express her design intent with respect to how the various parts of the program are intended to work together, thus offering stronger type-safety than structural typing.

Libraries for Creating Type-Erased Interfaces

Creating a type-erased interface can be a cumbersome process. Fortunately, there are libraries that ease some of the pain and make this technique accessible. Boost.TypeErasure is one such library that provides a number of predefined concepts. Adobe poly is another excellent library for creating type-erased interfaces.

A Note about Terminology

The usage of the term type erasure can be a bit confusing. For example, what is known as type erasure in Java is quite different from the technique described here. Fundamentally, type erasure in C++ is a mechanism by which type information is "hidden" from parts of the program at compile-time. In a strict sense of the term, inheritance is also a form of type erasure - clients only know about the base class at compile-time, while the derived types are "hidden" from them. However, in popular usage, the term type erasure refers to the technique presented in this article. A less confusing, and perhaps more descriptive, name might be external polymorphism.

Points of Interest

Inheritance is not always the right choice for runtime polymorphism. We need to be able to write generic algorithms on runtime types.
Type erasure can be used to create value-semantic adapters that are implemented in terms of the adapted object's properties instead of its type. It allows us to write generic algorithms on runtime types.
Understanding the relationship between type compatibility and polymorphism gives us a framework to reason about design choices as they relate to polymorphism.