Click here to Skip to main content
15,891,248 members
Articles / Programming Languages / C++11

Registered Pointers - High-Performance C++ Smart Pointers That Can Target The Stack

Rate me:
Please Sign up or sign in to vote.
4.67/5 (6 votes)
20 Mar 2016MIT6 min read 21.5K   149   12   8
An introduction to new smart pointers meant to be safe replacements for raw pointers and (raw) references.

Quick summary

mse::TRegisteredPointer is a smart pointer that behaves just like a raw pointer, except that its value is automatically set to null_ptr when the target object is destroyed. It can be used as a general replacement for raw pointers in most situations. Like a raw pointer, it does not have any intrinsic thread safety. But in exchange it has no problem targeting objects allocated on the stack (and obtaining the corresponding performance benefit). With default run-time checks enabled, this pointer is safe from accessing invalid memory.

 

mse::TRegisteredFixedPointer is a derivative of mse::TRegisteredPointer that is a functional equivalent of a C++ reference. That is, it may only be constructed to point at an existing object and cannot be retargeted after construction. While these properties may make it unlikely that a C++ reference will end up being used to access invalid memory, it is of course, not impossible. mse::TRegisteredFixedPointer on the other hand, inherits mse::TRegisteredPointer's safety with respect to invalid memory access.

 

Who should use registered pointers?

Registered pointers are appropriate for use by two groups of C++ developers - those for whom safety and security are critical, and also everybody else.
Registered pointers can help eliminate many of the opportunities for inadvertently accessing invalid memory.
While using registered pointers can incur a modest performance cost, because the registered pointers have the same behavior as raw pointers when pointing to valid objects, they can be "disabled" (automatically replaced with the corresponding raw pointer) with a compile-time directive, allowing them to be used to help catch bugs in debug/test/beta modes while incurring no overhead cost in release mode. So there is really no excuse for not using them.
 

Usage

Using registered pointers is easy. Just copy two files, mseprimitives.h and mseregistered.h, into your project (or "include" directory). There are no other dependencies. Registered pointer usage is very similar to raw pointer usage and they can generally be used as a "drop-in" substitute. Note that the target object does have to be declared as a "registered object". Because the registered object type is publicly derived from the original object's type, it remains compatible with it.

C++
#include "mseregistered.h"
...

    class A {
    public:
        int b = 3;
    };

    A a;
    mse::TRegisteredObj<A> registered_a;

    A* A_native_ptr1 = &a;
    mse::TRegisteredPointer<A> A_registered_ptr1 = &registered_a;

    A* A_native_ptr2 = new A();
    mse::TRegisteredPointer<A> A_registered_ptr2 = mse::registered_new<A>();

    delete A_native_ptr2;
    mse::registered_delete<A>(A_registered_ptr2);

If you prefer to do less typing, shorter aliases are available:

C++
#include "mseregistered.h"
using namespace mse;
...

    class A {
    public:
        int b = 3;
    };

    ro<A> registered_a;
    rp<A> A_registered_ptr1 = &registered_a;
    rp<A> A_registered_ptr2 = rnew<A>();
    rdelete<A>(A_registered_ptr2);

The example project included with this article contains a comprehensive set of examples of registered pointers in action.

 

Discussion

These days C++ stands out as a uniquely dangerous language. At least compared to the other modern languages. By "dangerous", I mean the ever-present significant possibility of accessing invalid memory. The potential consequences of invalid memory access can be severe. From exposure of sensitive data to complete compromise of the run-time environment.

Presumably this is the main reason C++ is not a popular language for (server side) web applications. Yet curiously, it is still the language used for critical parts of the web infrastructure. Web servers and web browsers, for example. Why is that? I suggest that it's simply because no other language is really up to the job. One issue in particular is that a lot of the other languages depend on garbage collection to achieve their language safety, which is arguably not appropriate for writing complex systems that need to be reliably responsive.

But C++ is still dangerous, and there have been countless security exploits that have taken advantage of that.

Since C++11, C++ has become a much more powerful language. Is there really still no practical way to avoid using C++'s dangerous elements? Well let's consider the most dangerous element of all, the pointer. Experienced (older) C++ programmers know how easy it can be to unintentionally end up with a pointer pointing to invalid memory. The situation is better now that the STL provides well-tested versions of many of the commonly used dynamic data structures so you don't have to implement your own, eliminating much of the need to use pointers at all.

And when using dynamic allocation, std::shared_ptr can often be a great substitute for raw pointers that helps ensure you don't accidentally deallocate the target object prematurely. Using std::shared_ptr essentially gets you the safety benefits of garbage collection, but, like garbage collection, there is a performance cost. In my opinion the safety benefit is worth it in pretty much all situations, but others would disagree.

The popular position in the C++ community seems to be that it is still appropriate to use raw pointers in situations where the user does not participate in the ownership (i.e. scheduling of the destruction) of the target object. More astute programmers add the condition that you must be sure that the target object will outlive the pointer. The problem is that this condition is easy to get wrong. Consider this example:

C++
#include <vector>

class CNames : public std::vector<std::string> {
public:
    void addName(const std::string& name) {
        (*this).push_back(name);
    }
};

class CQuarantineInfo {
public:
    void add_quarantine_patient(const std::string* p_patient_name) {
        if (p_patient_name) {
            if ((3 * supervising_doctors.size()) <= quarantined_patients.size()) {
                /* The policy is to have at least one supervising doctor for every 3 patients. */
                if (1 <= available_reserve_doctors.size()) {
                    supervising_doctors.addName(available_reserve_doctors.back());
                    supervising_doctors.shrink_to_fit(); /* Just to increase the likelihood of exposing
                        the bug. */
                    available_reserve_doctors.pop_back();
                }
            }
            quarantined_patients.addName(*p_patient_name);
        }
    }

    CNames quarantined_patients;
    CNames supervising_doctors;
    CNames available_reserve_doctors;
};

void main(int argc, char* argv[]) {
    CQuarantineInfo quarantine_info;
    quarantine_info.available_reserve_doctors.addName("Dr. Bob");
    quarantine_info.available_reserve_doctors.addName("Dr. Dan");
    quarantine_info.available_reserve_doctors.addName("Dr. Jane");
    quarantine_info.available_reserve_doctors.addName("Dr. Tim");

    quarantine_info.add_quarantine_patient(&std::string("Amy"));
    quarantine_info.add_quarantine_patient(&std::string("Carl"));
    quarantine_info.add_quarantine_patient(&std::string("Earl"));

    /* Suppose the supervising doctor contracts the infection and becomes a patient too. */
    const std::string* p_name_of_doctor_that_contracted_the_infection = &(quarantine_info.supervising_doctors.front());
    quarantine_info.add_quarantine_patient(p_name_of_doctor_that_contracted_the_infection);

    /* The problem here is that the add_quarantine_patient() function might first add another doctor to
    the set of supervising_doctors. But because supervising_doctors is ultimately implemented as an
    std::vector<>, an insert (or push_back) operation could cause a "reallocation" event which would
    invalidate any references to any member of the vector. So the add_quarantine_patient() function
    could inadvertently invalidate its parameter before it is finished using it. */
}

It may never have occurred to the author of the add_quarantine_patient() function that the reference to the new patient could also be a reference to a supervising doctor, in which case the function can inadvertently cause the target of its p_patient_name parameter to be invalidated before it's finished using it.

It's a contrived example, but this kind of thing can easily happen in more complex situations. Of course using raw pointers is perfectly safe in the vast majority of cases. The problem is that there are a minority of cases where it's easy to assume that it's safe when it really isn't. So the prudent policy is to simply not use raw pointers (unless you're going to do some very thorough testing).

Again, using std::shared_ptr in place of raw pointers everywhere would be a simple way to solve the problem, but with a performance cost. A lot of that performance cost comes from the constraint that std::shared_ptr target objects cannot (or should not) be allocated on the stack. So when considering performance, registered pointers can often be a better alternative.

Here's what the above example looks like when substituting raw pointers (and references) with registered pointers:

C++
#include <vector>
#include "mseregistered.h"
using namespace mse;
/* Note that "ro<>" is aliased to mse::RegisteredObj<>, "rcp<>" to mse::RegisteredConstPointer<> and
"rfcp<>" to mse::RegisteredFixedConstPointer<>. */

class CNames : public std::vector<ro<std::string>> {
public:
    void addName(rfcp<std::string> p_name) {
        (*this).push_back(*p_name);
    }
};

class CQuarantineInfo {
public:
    void add_quarantine_patient(rcp<std::string> p_patient_name) {
        if (p_patient_name) {
            if ((3 * supervising_doctors.size()) <= quarantined_patients.size()) {
                /* The policy is to have at least one supervising doctor for every 3 patients. */
                if (1 <= available_reserve_doctors.size()) {
                    supervising_doctors.addName(&available_reserve_doctors.back());
                    supervising_doctors.shrink_to_fit(); /* Just to increase the likelihood of exposing the bug. */
                    available_reserve_doctors.pop_back();
                }
            }
            quarantined_patients.addName(&*p_patient_name);
        }
    }

    CNames quarantined_patients;
    CNames supervising_doctors;
    CNames available_reserve_doctors;
};

void main(int argc, char* argv[]) {
    CQuarantineInfo quarantine_info;
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Bob"));
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Dan"));
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Jane"));
    quarantine_info.available_reserve_doctors.addName(&ro<std::string>("Dr. Tim"));

    quarantine_info.add_quarantine_patient(&ro<std::string>("Amy"));
    quarantine_info.add_quarantine_patient(&ro<std::string>("Carl"));
    quarantine_info.add_quarantine_patient(&ro<std::string>("Earl"));

    /* Suppose the supervising doctor contracts the infection and becomes a patient too. */
    rcp<std::string> p_name_of_doctor_that_contracted_the_infection = &(quarantine_info.supervising_doctors.front());
    try {
        quarantine_info.add_quarantine_patient(p_name_of_doctor_that_contracted_the_infection);
        /* The problem here is that the add_quarantine_patient() function might first add another
        doctor to the set of supervising_doctors. But because supervising_doctors is ultimately
        implemented as an std::vector<>, an insert (or push_back) operation could cause a
        "reallocation" event whichwould invalidate any references to any member of the vector. So the
        add_quarantine_patient() function could inadvertently invalidate its parameter before it is
        finished using it. */
        /* By default, registered pointers will throw an exception on any attempt to access invalid
        memory. */
    }
    catch (...) {
        /* Whether the bug is exposed depends on the implementation of std::vector<>. Under msvc2015 in
        debug mode (March 2016), the bug does manifest and an exception is caught here. */
    }

    /* Just to demonstrate that registered pointers also support stack allocated objects. */
    ro<std::string> patient_fred("Fred");
    quarantine_info.add_quarantine_patient(&patient_fred);
}

By default, registered pointers will throw an exception on any attempt to access invalid memory.

So there you go, C++'s most dangerous element made safe. Without sacrificing the performance benefit of stack allocation. Used along with the rest of the "SaferCPlusPlus" library, it is now practical to write C++ code with greatly reduced risk of accessing invalid memory.

Before we finish up, every good data type plugging article needs a benchmark chart:

Allocation, deallocation, pointer copy and assignment:

Pointer Type Time
mse::TRegisteredPointer (stack) 0.027 seconds
native pointer (heap) 0.049 seconds
mse::TRegisteredPointer (heap) 0.074 seconds
std::shared_ptr (heap) 0.087 seconds

So as we can see, mse::TRegisteredPointers targeting stack allocated objects easily outperform even native (aka raw) pointers targeting heap allocated objects.

That's it. Let's code safely out there.

 

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Questionmissing information Pin
Stefan_Lang29-Apr-16 5:25
Stefan_Lang29-Apr-16 5:25 
AnswerRe: missing information Pin
Noah L30-Apr-16 6:33
Noah L30-Apr-16 6:33 
Stefan_Lang wrote:
The zip file contains two big pdf files that don't appear to be related. What are they for?

You mean the zip file from here: http://duneroadrunner.github.io/SaferCPlusPlus? Well, the "registered" pointers introduced in the article are part of a library of safer C++ data types. I guess the pdf started off as the original documentation for the library, but now the webpage itself is more complete and up-to-date, so you can just ignore the pdf file.

Stefan_Lang wrote:
OTOH, I failed to pinpoint the code you used for your benchmark results. Where is it? It's a bit hard to credit your results without knowing what you tested, and how.

You can find the benchmark code in the "msetl_example.cpp" file in the zip, or you can browse it online here: https://github.com/duneroadrunner/SaferCPlusPlus/blob/master/msetl_example.cpp. It has become a big file, just search for "benchmarks". These benchmarks are not intended to be taken too seriously. They are kind of simplistic, and for a number of reasons they may not reflect real world performance. That said I'd be interested to see what kind of performance others are observing in their environments. More up-to-date benchmark results can be found here: http://duneroadrunner.github.io/SaferCPlusPlus/#simple-benchmarks

Stefan_Lang wrote:
As for your smart pointer implementation, I am missing an explanation of what they do. I've looked over two header files you listed as relevant, and found that you turned the concept of smart pointers (with use counters) around to register all pointers to an object, so you can keep track of them (and set to 0 when indicated). A paragraph in your article explaining this (and maybe a little beyond that) would have saved me a lot of time.

Thanks for pointing that out. Hopefully others who are interested in the implementation will read your comment. In case it's not clear to others reading this, Stefan is pointing out that "registered" objects keep track of all the (registered) pointers targeting them so that the object can set the pointers to nullptr upon its destruction.

Unlike a lot of codeproject articles, I neglected talking about the implementation (the "how"), and instead tried to emphasize the motivation (the "why") for using a safe reference type like the registered pointer presented here, because it seemed to me to be the more important and less obvious of the two.

Stefan_Lang wrote:
Now I understand that your pointers have a different goal in mind than the typical smart pointer implementation. I am not at all sure I have a usecase for them - they only appear to be useful if you wish to use them for stack items as well as heap items, and I can't think of a reason to do that. The main advantage I see is their ability to be nullable - but that is not a concern for a normal smart pointer, because, if used correctly, the object they point to by definition can never by null!

Yes, as I mentioned in the article, I think using std::shared_ptrs to pass parameters by reference is a fine solution. But like I said, others would disagree. For example, in this article - GotW #91 Solution: Smart Pointer Parameters | Sutter’s Mill - the author argues against using std::shared_ptr to pass parameters in general due to their performance cost. std::shared_ptr in particular has a costly thread safety mechanism that can hurt "asynchronous scalability". Basically, the more threads you have running, the (potentially) slower std::shared_ptrs can become. So, for example, those concerned about std::shared_ptr's performance can use the registered pointers presented as an often faster, more scalable alternative.

Stefan_Lang wrote:
You correctly point out the problem of std::vector to invalidate it's iterators upon modification, but that issue is well-known, and if it could be a severe problem, you can easily remedy it by storing (smart) pointers to objects rather than the objects themselves!

Yes, storing smart pointers to the objects is a good solution and is one that I present in a follow-up article: How To Safely Pass Parameters By Reference in C++ - The Unsettled Question. The problem is that even if you know that vector iterators are prone to invalidation (and you might imagine that some inexperienced programmers don't), it can still be easy to misjudge whether or not it could be an issue in any specific situation. Like with the add_quarantine_patient() function in the article's example. The danger only arises when one of the doctors also becomes a patient. But you could imagine that it would be easy for the author of that function to assume that the doctors and the patients were sets of people with no overlap. So in order to be reliably safe, either you need a code analysis tool that can recognize potential dangers for you (CppCon 2015: Neil MacIntosh “Static Analysis and C++: More Than Lint" - YouTube), or you need safe reference types efficient and flexible enough to be the default for general use. Or both. And some would argue that std::shared_ptrs are not efficient or flexible (because they generally have to "own" their target object, and can't target stack objects) enough to be the default for general use.

Stefan_Lang wrote:
Is there any other usecase where smart pointers and standard best practices do not already provide a solution?

Well, starting at the bottom of the first page of the pdf file you referred to - https://github.com/duneroadrunner/SaferCPlusPlus/blob/master/msetl_blurb.pdf - there's a section that discusses whether or not registered pointers are relevant in modern code given the smart pointers we already have. But registered pointers are basically a safe drop-in substitute for raw pointers, and can also be used to replace raw references (which can be as unsafe as raw pointers). So registered pointers are great for replacing raw pointers in legacy code that you want to make more safe. But the diminished relevance of raw pointers in modern C++ code would apply to registered pointers as well. So I guess the question is, how big of a role do raw references still play in modern C++? Well the C++ establishment is currently of the opinion that they should be the default way to take parameters by reference (C++ Core Guidelines). In this case, substituting raw references with registered pointers would be safer.

But, as I argue in the follow-up article I mentioned, perhaps raw references should not necessarily be the default way to take parameters by reference. I generally concur with the points you've made and I recommend reading the follow-up article. I think it addresses a lot of them, and I'd be interested to find out what you think. Oh, but just so you know, I also neglected talking about implementations in the follow-up article as well, but the new smart pointer types introduced there have implementations much simpler than the registered pointers in this article Smile | :)

modified 30-Apr-16 12:39pm.

PraiseOutstanding! Pin
koothkeeper21-Mar-16 7:28
professionalkoothkeeper21-Mar-16 7:28 
GeneralRe: Outstanding! Pin
Noah L21-Mar-16 20:33
Noah L21-Mar-16 20:33 
GeneralMy vote of 3 Pin
SeattleC++16-Mar-16 13:03
SeattleC++16-Mar-16 13:03 
GeneralRe: My vote of 3 Pin
Noah L17-Mar-16 15:41
Noah L17-Mar-16 15:41 
QuestionStrange things Pin
Shvetsov Evgeniy11-Mar-16 4:19
professionalShvetsov Evgeniy11-Mar-16 4:19 
AnswerRe: Strange things Pin
Noah L11-Mar-16 9:41
Noah L11-Mar-16 9:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.