How To Safely Pass Parameters By Reference in C++ - The Unsettled Question

Noah L

4.67/5 (7 votes)

Apr 17, 2016

MIT

11 min read

32075

301

Use-after-free bugs, new smart pointers and the new state of safe C++ programming

Download source - 460.6 KB

Note that this article is kind of a sequel to (and may have a few redundancies with) a previous article that introduced "registered pointers".

Quick Summary

While the traditional way of passing parameters by "raw" pointer or reference is usually safe, in order to be certain that the parameter reference won't be used to access invalid memory, you must use a smart pointer (or smart reference). In particular, we recommend using mse::TRefCountingFixedPointer, mse::TScopeFixedPointer and/or mse::TRegisteredFixedPointer. If you are writing a function for more general use and don't want to restrict callers to using a specific type of smart pointer, you can "templatize" your function so that it will accept any type of pointer the caller wishes to use.

#include "mseregistered.h"
#include "mserefcounting.h"

class H {
public:
    /* Just an example of a templated member function. You might consider templating pointer parameter
    types to give the caller some flexibility as to which kind of (smart/safe) pointer they want to
    use. */
    template<typename _Tpointer>
    static int foo1(_Tpointer A_ptr) { return A_ptr->b; }

protected:
    ~H() {}
};

int main(int argc, char* argv[]) {
    class A {
    public:
        A() {}
        int b = 3;
    };

    A a_obj;
    A* a_ptr = &a_obj;
    int res1 = H::foo1(a_ptr);

    mse::TRegisteredObj<A> a_robj;
    mse::TRegisteredFixedPointer<A> a_rfptr = &a_robj; // safe smart pointer
    int res2 = H::foo1(a_rfptr);
    
    mse::TRefCountingFixedPointer<A> 
	a_refcfptr = mse::make_refcounting<A>(); // another safe smart pointer
    int res3 = H::foo1(a_refcfptr);
}

See? Easy as pie.

If for some reason you can't or don't want to templatize the function, but still want to give the caller some flexibility in terms of pointer reference parameters then you might consider using a "poly pointer". Poly pointers can act as either a strong/owning pointer or weak/non-owning pointer, as needed. When constructed from a strong/owning pointer (i.e. a reference counting pointer or an std::shared_ptr), the poly pointer will obtain and hold (shared) ownership of the target object.

Here we'll demonstrate the use of three different poly pointers - mse::TRefCountingOrXScopeFixedPointer, mse::TRefCountingOrXScopeOrRawFixedPointer, and mse::TSharedOrRawFixedPointer.

#include "msepoly.h"

class A {
public:
    A() {}
    int b = 3;
};

class H {
public:
    static int foo2(mse::TRefCountingOrXScopeFixedPointer<A> A_ptr) { return A_ptr->b; }
    static int foo3(mse::TRefCountingOrXScopeOrRawFixedPointer<A> A_ptr) { return A_ptr->b; }
    static int foo4(mse::TSharedOrRawFixedPointer<A> A_ptr) { return A_ptr->b; }
protected:
    ~H() {}
};

int main(int argc, char* argv[]) {
    A a_obj;
    A* a_ptr = &a_obj;

    mse::TXScopeObj<A> a_xscpobj;
    mse::TXScopeFixedPointer<A> a_xscpptr = &a_xscpobj;//a "smart" pointer for stack allocated objects

    mse::TRefCountingFixedPointer<A> a_refcfptr = mse::make_refcounting<A>();

    int res1 = H::foo2(a_xscpptr);
    int res2 = H::foo2(a_refcfptr);

    int res3 = H::foo3(a_ptr);
    int res4 = H::foo3(a_xscpptr);
    int res5 = H::foo3(a_refcfptr);

    std::shared_ptr<A> a_shptr = std::make_shared<A>();
    int res6 = H::foo4(a_ptr);
    int res7 = H::foo4(a_shptr);
}

Poly pointers do, of course, have a small run-time cost so "templatizing" your function is the preferred option.

Discussion

So what do we mean by "safely passing parameters by reference"? Well, consider the following example:

#include <string>
#include <vector>

class CProgram {
public:
    void add_instruction(const std::string& instruction_cref) {
        if ("!clear instructions" == instruction_cref) {
            instructions.clear();
            instructions.shrink_to_fit();
        } else {
            instructions.push_back(instruction_cref);
        }
    }
    void add_two_instructions(const std::string& 
    instruction1_cref, const std::string& instruction2_cref) {
        add_instruction(instruction1_cref);
        add_instruction(instruction2_cref);
    }

    std::vector<std::string> instructions;
};

void main(int argc, char* argv[]) {
    CProgram program1;
    program1.add_two_instructions(std::string("add 1"), 
    std::string("multiply by 2"));
    program1.add_two_instructions(program1.instructions.front(), 
    std::string("multiply by 3"));
    program1.add_two_instructions(std::string("!clear instructions"), 
    program1.instructions.front());
}

Assuming shrink_to_fit() does its job, the last line of the program is going to ultimately cause an invalid memory access, right? The add_two_instructions() function is going to receive two perfectly valid reference parameters, but it's going to inadvertently cause its second parameter to become invalid before it's finished using it. And then it's going to pass that invalid parameter to the add_instruction() function, who will attempt to use the invalid parameter. Which is bad.

So how big of a problem is this kind of bug in practice? Well it's hard to say. There are some that don't seem too concerned with the danger, while others are a bit dismayed by this lack of concern.

The type of bug in our given example is called a "use-after-free" bug. If we, for example, take a look at the recent history of "critical" security bugs in the popular C++ open source chromium browser project, we note that at the time of this writing, 8 of the 20 most recent ones indicated in their description that they were of the "use-after-free" variety. And if we, for example, check out the results of the first day of this year's Pwn2Own (2016), we note that all five successful exploits made use of at least one use-after-free vulnerability.

Among modern languages, memory use-after-free bugs are primarily a C++ phenomenon. Languages like C# and Java use mechanisms like mandatory garbage collection to avoid this type of bug. In C++ we have the option of using garbage collection to address the issue, but from a C++ perspective, garbage collection has some unappealing characteristics. Non-deterministic destruction, higher memory use, "gc pauses", etc.. Fortunately, there are alternative solutions via safe smart pointers. Let's quickly look at three safe smart pointers designed to address the use-after-free issue as efficiently as possible - mse::TRefCountingPointer, mse::TScopeFixedPointer and mse::TRegisteredPointer.

mse::TRefCountingPointer is a reference counting smart pointer like std::shared_ptr. But mse::TRefCountingPointer is in some ways less flexible than std::shared_ptr, allowing it to be a bit more efficient (smaller and faster) and perhaps more appropriate for general use. For example, mse::TRefCountingPointer foregoes std::shared_ptr's (costly) thread safety mechanism. And mse::TRefCountingPointer comes in "not null" and "fixed" (non-retargetable) versions that can be safely assumed to always be pointing to a validly allocated object.

mse::TScopeFixedPointer points to objects that are allocated on the stack, or whose "owning" pointer is allocated on the stack. The point of "scope" pointers is essentially to identify a set of circumstances that are simple and deterministic enough that no (runtime) safety mechanisms are necessary.

mse::TRegisteredPointer is essentially a safe direct substitute for raw pointers. By default, it will throw an exception on any attempt to access invalid memory.

So armed with these safe smart pointers, let's safen up our earlier example:

#include "mserefcounting.h"
#include "msescope.h"
#include <string>
#include <vector>

class CProgramV2 {
public:
    template<typename _TStringPointer>
    void add_instruction(_TStringPointer instruction_ptr) { // now a template function
        if ("!clear instructions" == *instruction_ptr) {
            instruction_ptrs.clear();
            instruction_ptrs.shrink_to_fit();
        } else {
            instruction_ptrs.push_back(mse::make_refcounting<std::string>(*instruction_ptr));
        }
    }
    template<typename _TStringPointer1, typename _TStringPointer2>
    void add_two_instructions(_TStringPointer1 instruction1_ptr, _TStringPointer2 instruction2_ptr) {
        add_instruction(instruction1_ptr);
        add_instruction(instruction2_ptr);
    }

    /* We need to make references to stored items safe. One way to do this is to use reference counting
    pointers. */
    std::vector<mse::TRefCountingFixedPointer<std::string>> instruction_ptrs;
};

void main(int argc, char* argv[]) {
    CProgramV2 program1;
    mse::TScopeObj<std::string> add_one_scpobj("add 1");
    /* We explicitly declare an mse::TScopeFixedPointer here just to show what's going on. */
    mse::TScopeFixedPointer<std::string> add_one_scpfptr = &add_one_scpobj;
    program1.add_two_instructions(add_one_scpfptr, 
    &mse::TScopeObj<std::string>("multiply by 2"));
    program1.add_two_instructions(program1.instruction_ptrs.front(), 
    &mse::TScopeObj<std::string>("multiply by 3"));
    program1.add_two_instructions(&mse::TScopeObj<std::string>
    ("!clear instructions"), program1.instruction_ptrs.front());
}

The first thing to note is that the add_instruction() and add_two_instructions() member functions have been turned into template functions, allowing them to accept as parameters any type of pointer, smart or otherwise.

The way we've chosen to solve the use-after-free issue here is to roughly emulate the memory management of garbage collected languages by using reference counting pointers for heap allocated objects. Stack allocated objects did not contribute to the use-after-free bug in the example, but for demonstration purposes here we've declared them as "scope" objects for enhanced (compile-time) safety.

In fact, a general, straight-forward way to achieve pointer safety is simply to restrict the types of allowable pointers to these reference counting and scope pointers only.

That said, let's consider another solution. mse::mstd::vector is a safe implementation of std::vector with safe iterators that can be used where we used reference counting pointers in the previous solution.

#include "msescope.h"
#include <string>
#include "msemstdvector.h"

class CProgramV3 {
public:
    template<typename _TStringPointer>
    void add_instruction(_TStringPointer instruction_ptr) {
        if ("!clear instructions" == *instruction_ptr) {
            instructions.clear();
            instructions.shrink_to_fit();
        }
        else {
            instructions.push_back(*instruction_ptr);
        }
    }
    template<typename _TStringPointer1, typename _TStringPointer2>
    void add_two_instructions(_TStringPointer1 instruction1_ptr, _TStringPointer2 instruction2_ptr) {
        add_instruction(instruction1_ptr);
        add_instruction(instruction2_ptr);
    }

    /* mse::mstd::vector is just a safe implementation of std::vector. */
    mse::mstd::vector<std::string> instructions;
};

void main(int argc, char* argv[]) {
    CProgramV3 program1;
    program1.add_two_instructions(&mse::TScopeObj<std::string>("add 1"), 
	&mse::TScopeObj<std::string>("multiply by 2"));
    program1.add_two_instructions(program1.instructions.cbegin(), 
	&mse::TScopeObj<std::string>("multiply by 3"));
    bool expected_exception = false;
    try {
        program1.add_two_instructions(&mse::TScopeObj<std::string>("!clear instructions"), 
		program1.instructions.cbegin());
    }
    catch (...) {
        expected_exception = true;
        /* The iterator returned by program1.instructions.cbegin() is going to become invalid when
        the vector is cleared, and will throw an exception when the program tries to dereference it. */
    }
}

First let's note that the templated functions have no problem accepting an iterator as a pointer parameter. Second, note that while reference counting pointers accomplish their safety by extending the target object's lifespan so that all references to it remain valid, mse::mstd::vector iterators achieve their safety by throwing an exception when used to try to access an item that is no longer valid. It's arguable which behavior is preferable, but in C++ you have the choice. In languages with mandatory garbage collection, you don't.

We should also note that std::vector, in its standard implementation, is an unsafe container in that it allows (unchecked) access to invalid memory through both its iterators and its "[]" operator. Safer programming practices would have us use a safer implementation like mse::mstd::vector.

And quickly, let's look at one more solution using registered pointers this time.

#include "mseregistered.h"
#include <string>
#include "msemstdvector.h"

class CProgramV4 {
public:
    template<typename _TStringPointer>
    void add_instruction(_TStringPointer instruction_ptr) {
        if ("!clear instructions" == *instruction_ptr) {
            instructions.clear();
            instructions.shrink_to_fit();
        }
        else {
            instructions.push_back(*instruction_ptr);
        }
    }
    template<typename _TStringPointer1, typename _TStringPointer2>
    void add_two_instructions(_TStringPointer1 instruction1_ptr, _TStringPointer2 instruction2_ptr) {
        add_instruction(instruction1_ptr);
        add_instruction(instruction2_ptr);
    }

    /* mse::TRegisteredObj<std::string> is meant to behave just like an std::string. */
    mse::mstd::vector<mse::TRegisteredObj<std::string>> instructions;
};

void main(int argc, char* argv[]) {
    CProgramV4 program1;
    program1.add_two_instructions(&mse::TRegisteredObj<std::string>
    ("add 1"), &mse::TRegisteredObj<std::string>("multiply by 2"));
    /* We explicitly declare an mse::TRegisteredFixedPointer here just to show what's going on. */
    mse::TRegisteredFixedPointer<std::string> first_instruction_rfptr = &(program1.instructions.front());
    program1.add_two_instructions(first_instruction_rfptr, 
    &mse::TRegisteredObj<std::string>("multiply by 3"));
    bool expected_exception = false;
    try {
        program1.add_two_instructions(&mse::TRegisteredObj<std::string>
        ("!clear instructions"), &(program1.instructions.front()));
    }
    catch (...) {
        expected_exception = true;
        /* The registered pointer returned by &(program1.instructions.front()) is going to become
        invalid when the vector is cleared, and will throw an exception when the program tries to
        dereference it. */
    }
}

So another general, straight-forward way of achieving pointer safety is to replace all classes and pointers with their "registered" counterparts. Registered pointers have a little bit more overhead than reference counting pointers for heap allocated objects, and certainly more overhead than scope pointers (which by default have no runtime overhead) for stack allocated objects. A possible reason you might prefer to use registered pointers, besides their simplicity, is that their behavior is the same as their corresponding raw pointers when pointing to valid objects. This allows them to be "disabled" (automatically replaced with their native counterparts) with a compile-time directive, enabling you to generate both safe and (high-performance) less-safe versions of your application. And since the safe version will thow an exception on any attempt to access invalid memory, it can be used to help find bugs during testing.

Btw, if all this safe pointer code seems a little verbose, shorter aliases are available. (Look in the header files for "shorter aliases".) Or better yet, you can of course make your own.

To summarize, for safe coding we recommend, by default, making functions with reference parameters intended for general use into template functions that can accept any type of pointer reference. And when you need to use a pointer or reference, in general we recommend using mse::TRefCountingFixedPointer for heap allocated objects and mse::TScopeFixedPointer for stack allocated objects. If there are situations where those pointers for some reason aren't ideal, mse::TRegisteredFixedPointer is also a recommended safe pointer type.

Benchmarks

"So", you're asking, "how much performance are these safe pointers going to cost me?" As with anything, it depends on your specific use case, but to give you a very rough idea, here are the results of some simple micro-benchmarks:

Allocation, deallocation, pointer copy and assignment

Pointer Type	Time
`mse::TRegisteredPointer` (stack)	0.0317188 seconds
native pointer (heap)	0.0394826 seconds
`mse::TRefCountingPointer` (heap)	0.0493629 seconds
`mse::TRegisteredPointer` (heap)	0.0573699 seconds
`std::shared_ptr` (heap)	0.0692405 seconds
`mse::TRelaxedRegisteredPointer` (heap)	0.14475 seconds

Dereferencing

Pointer Type	Time
native pointer	0.0105804 seconds
`mse::TRelaxedRegisteredPointer` unchecked	0.0136354 seconds
`mse::TRefCountingPointer` (checked)	0.0258107 seconds
`mse::TRelaxedRegisteredPointer` (checked)	0.0308289 seconds
`std::weak_ptr`	0.179833 seconds

Benchmark environment: msvc2015/x64/Window 7/Haswell

Note that mse::TRefCountingFixedPointer always points to a validly allocated object, so its dereferences don't need to be checked. mse::TRegisteredPointer's safety mechanisms are not compatible with the techniques used by the benchmark to isolate dereferencing performance, but mse::TRegisteredPointer's dereferencing performance would be expected to be essentially identical to that of mse::TRelaxedRegisteredPointer. By default, scope pointers have identical performance to native pointers.

Now you shouldn't make too much of these results, they may not reflect real-world performance. And they seem to be somewhat compiler dependent. That said, from these results, one gets the sense that there are performance costs, but that they would probably be measured in percentages, not multiples.

Is the cost worth it? Of course that depends on your particular situation, but again, noting the frequency of things like use-after-free vulnerabilities in critical internet infrastructure, societally we are paying a cost for our current unsafe coding practices.

To Adopt Or Not To Adopt

Now, you may feel a little bit apprehensive about adopting this kind of safe coding. It's unfamiliar and can seem a little verbose. And since it's new, there isn't yet a credible installed base. Well, let's look at four factors you'd want to consider when deciding whether to adopt a new coding technique - utility versus cost, reasonableness/riskiness, dependencies and compatibility.

The value of code safety is going to be situation specific, and so is the cost. Not just specific to the application, but often specific within the application. Even in performance critical applications, it is often the case that only a minority of the total code is actually performance critical. So you might have parts of the code where safety is a secondary priority, and other parts where it is not.

A good way to think about the reasonableness of this kind of coding might be to compare it to what's going on under the hood in languages like C# and Java. From this perspective, the overhead and complexity of safe smart pointers like mse::TRefCountingFixedPointer and mse::TRegisteredFixedPointer doesn't compare unfavorably to the underlying mechanics of the garbage collection that's going on in those languages. In our case, the mechanics are just more exposed. And if you're feeling uneasy about all the template functions, well, the STL itself pretty well demonstrates the compiler's capacity to handle templates in quantity.

One nice thing about this safe coding style is that it really introduces no dependency risk. In fact, the templatization of function reference parameters frees us from the long-standing defacto dependency on the traditional unsafe parameter passing interface. As for the safe smart pointers themselves, they are each self-contained in one or two files with no other dependencies. And if at any point in the future you choose to revert to the standard pointer types, it couldn't be easier. They have a built in compile-time feature to automatically do just that. Or you can simply alias them to their corresponding standard types and get rid of their implementation files completely. Same goes for mse::mstd::vector.

And lastly, this safe coding style is very much compatible with the traditional (unsafe) coding style, allowing them to be mixed together in the code. Turning functions into function templates doesn't prevent them from accepting standard pointer types, and conversely the safe smart pointer target objects are either no different from, or compatible with, the original object type.

...
    /* just demonstrating compatibility with traditional interfaces */
    CProgramV2 program1;
    program1.add_two_instructions(&std::string("add 1"), &std::string("multiply by 2"));

    class B {
    public:
        static int foo3(const std::string& string1_cref) { return (int)(string1_cref.size()); }
    };

    mse::TScopeObj<std::string> add_one_scpobj("add 1");
    int res1 = B::foo3(add_one_scpobj);

    mse::TRegisteredObj<std::string> add_one_regobj("add 1");
    int res2 = B::foo3(add_one_regobj);

    mse::TRefCountingFixedPointer<std::string> add_one_refcfptr = 
    	mse::make_refcounting<std::string>("add 1");
    int res3= B::foo3(*add_one_refcfptr);
...

The State of Safe C++ Programming

So basically what we've done here is demonstrate practical ways of achieving, in C++, the pointer/reference safety of garbage collection without some of the associated drawbacks.

But C++ still can't match the safety of other modern languages in the sense that there is no compile-time restriction on using unsafe code, and there is no reasonably complete, up-to-date safe implementation of the standard libraries.

C++ does have some safety advantages over say, Java and C#, including const references and deterministic, automatic resource deallocation through destructors (RAII). But probably more consequential are C++'s comprehensive overloading features and powerful preprocessor that allow you to build efficient language elements with essentially whatever safety features you want to add. For example, the "safe numerics" library allows you to create integers with custom range limits. Or, for example, you could imagine a smart pointer that tracks and analyzes its dereferences for suspicious usage patterns.

While today's garden of safe programming may lie with the current crop of garbage collected languages, it may be that the future of safe programming will be the domain of more powerful, efficient and deterministic languages. And who knows, C++ may be one of them.