Visions of Generics and Templates: How Parametrics are Compiled

User 11633146

5.00/5 (2 votes)

May 6, 2015

CC (BY-ND 3.0)

8 min read

6711

Visions of generics and templates: How parametrics are compiled

In the world of parametric programming, our languages often seem quite varied and unique. At the core however are two primary approaches, let’s call them templates and generics. In this article, I’ll look briefly at how these are implemented by a compiler and touch on the features and limitations of each.

I should note right up front that each compiler and language has its own way of doing things and might not conform strictly to these approaches. Mostly, they tend to behave “as-if” this is how it is implemented. Of course there’s variation, if there wasn’t, that wouldn’t be any fun.

Templates

C++ style templates are perhaps easier to understand what the compiler is doing. It treats the template similar to a macro and substitutes the type. In this code:

template<typename T> struct sum {
    T value;
    void add( T incr ) {
        value += incr;
    }
}

typedef sum<int> int_sum;

int_sum is equivalent to this code:

struct int_sum {
    int value;
    void add( int incr ) {
        value += incr;
    }
}

It’s not treated exactly like a macro: the substitution and syntax rules have many guidelines. Yet looking at it, this is sufficient to understand the basics of what the compiler is doing. It takes the template code and makes a specific version of it for each instance of the template. Post-compilation, the template is gone and we’re left with only the individual instances.

Pros

The primary advantage to this technique is the expressiveness of the templates. Virtually any code we can write without a template can be trivially transformed into a template version. We can treat T exactly as the instance type. We know what it is, we can new it, use operator overloads, and call any functions on it. Expressiveness is perhaps the biggest advantage of this approach.

Templates also allow partial specialization, at least in C++. This is really helpful when you have two classes that behave nearly the same, but one aspect differs for a certain type. The shared part need not even be valid for all types so long as the proper specializations exist. Everything is resolved at compile-time, so there is no fear of runtime type incompatibility.

Another advantage of this approach is speed. Each instance of the template has code generated just for it. That givers the optimizer full knowledge to do its work. Fundamental types can be used directly without any kind of boxing, giving good space efficiency. This makes templates suitable for computational expensive, or performance critical code.

Cons

Distribution of such templates is perhaps the biggest problem. The full template code for a library must reside in header files, and is compiled directly into each object. While it is possible to get some code into a shared library, it’s very hard to swap out this library with a new version: it’s basically just not done. Changes in the template code will require a recompile of all projects that use it. I’d say this is the key drawback to this approach.

A big downside to this approach is code-size. Template classes can be big, and duplicating that code for every instance type can bloat the object code. In some cases, this could lead to the CPU swapping its caches more, but for most uses the size itself isn’t the problem. All these copies have to be individually created by the compiler, and then compiled. Anybody who’s worked on a large C++ project with lots of templates can attest to the impact on compilation speed this has.

It is certainly possible for an optimizer and linker to actually merge a lot of code. While it may be different at the language level, quite often similar types result in the same assembly level code. LLVM contains a nice MergeFunctions optimization pass.

We also can’t forget the error messages. Perhaps this is just a symptom of C++ and other languages have faired better. In C++, it’s not uncommon to get pages and pages of cryptic template messages when making even trivial errors. This arises from the compiler being unable to conceptualize what logical error was made. All it can do is show the lowest level error, deep within the nested templates, and a path back to instantiation, hoping you can identify the problem.

C++11 had concepts as a proposal to help this issue. This later became constraints in C++14. The goal is to detect problems at a much higher level in the template code and be able to match it to logical error messages.

Generics

Java or C# style generics produce only one version of the parametric code. The generic parameter is substituted with an object, or a constrained base-level type. Using a constrained type is easier to show what happens:

class Adder<T> where T : IAddable {
    T Add( T a, T b ) {
        return a.Add(b);
    }
}

That code becomes something similar to:

class Adder {
    IAddable Add( IAddable a, IAddable b ) {
        return a.Add(b);
    }
}

The resulting class is entirely generic, and well defined. The added value of generics is that it will take care of automatic casting and type conversion. If we have a variable of type Adder<Complex>, the Add function will only take Complex values and will return a Complex as a result.

While C# logically processes generics this way, for value types the IL compiler takes the C++ template approach and does type substitution. It produces a copy of the code for each value type used. For reference types, only one copy of the code is used. This is hidden from the user though, so the feature set remains as though only the singular object form is used.

Pros

Generics can be provided in shared libraries. They can be used without access to the source code and can be updated without having to recompile all the programs that use them. This allows generics to be safely used as part of a public API. Though often underrated, this is a really nice advantage of generics.

Only one version of the code is needed to support the various instances. The keeps code size down and also significantly improves compilation time. The compiler just compiles the generic once. C# has the exception with value types, which does produce copies. However, since this is done at the IL level, the compilation burden is far reduced and not likely a speed issue.

As the generic code on its own must also be semantically valid, the compiler doesn’t need to know its parameter types in order to compile. This usually results in clear and location relevant error messages. It may also lead to more readable code, though I’m not certain on this point.

Cons

Generics are quite inflexible. They work well for collection like types, but have a hard time adapting to other uses. The generic class itself must be semantically valid for the constraint type used. This means any deviation from very basic behaviour requires defining and implementing interfaces on the objects. The lack of expressiveness is perhaps the key drawback to generics.

There is an efficiency issue when working with fundamental types. The generic only works with object types, so all fundamentals need to be boxed and unboxed. This has a direct speed impact as well as creating a lot more objects to be managed. This is perhaps the main motivation why C# actually handles value type generics as C++ does: by producing copies of the code the generic need not box fundamentals.

A problem that had existed in Java, but not in C#, was the generic did not know what type it was working with. This is major drawback since a collection is unable to create the objects it holds. There’s no fundamental reason why this approach can’t know its type at runtime.

In the case of C# reference types, I was unable to figure out how the .NET VM actually exposes this type to generic functions. I suspect it’s a hidden parameter, but if somebody knows for sure please let me know. I’m curious.

There is no type specialization. Since there is only one version of the code, it isn’t possible to alter behaviour based on the actual type used. In C#, you can of course compare the type at runtime to provide specialized conditions. It is a bit clumsy, and inefficient, compared to specialization, but it does work. Often this lack of specialization can prevent two nearly identical code blocks from being merged.

More Options

There are more options, and more combined ways to look at the issue. Consider that dynamic languages are basically parametric by default, employing a duck typing approach based on the runtime type. This makes them rather expressive using only one copy of the code, but at the loss of efficiency and specialization.

In Leaf, I’m considering a hybrid method. The core of the template will follow the generics approach, using a generalized object type. To keep it expressive though, an automatic interface will be generated. All the operations needed in the parametric class or function will be bundled into an template. This template is copied for each type and passed to the generic code. It’s just a rough idea at this point.

If you know of a different parametric approach in other compilers, for any language, then please leave a comment.

If you’d like to learn more about compilers, then follow me on Twitter. I have many more things to write on this topic. If there’s something special you’d like to hear about, or want to arrange a presentation, feel free to contact me.