The best-laid schemes of threads and objects
Go oft awry,
And leave us nothing but grief and pain,
For promised joy!
(with apologies to R. Burns)

Download source code - 13.4 KB

Background - The Promised Land

Multi-threading and object-oriented languages, each come with their promises of making life simpler for the creators of complex systems. Both of them offer methods for cutting those complex systems into manageable pieces with well-defined interaction between them. On one side multi-threading tries to divide the work into small pieces and assign each piece to a separate processor, be it physical or virtual. Someone can than just wait for all the pieces of work to be finished and assemble the final results.

The other one, object-oriented languages, says that only the important information should be visible to the outside world, leaving implementation details hidden inside those "objects", and also that more complicated objects can be created from simpler ones through inheritance or composition.

Wouldn't it be grand if we could join these two concepts together and have some little thread objects that do their work and hide all unnecessary implementation details? As we will see it is indeed possible but it's not that easy.

As an example, we will look at how to find all the prime numbers less than a certain value and we will stick to the good old C++ because it is still considered one of the most efficient language.

A Simple Program using std::thread

The C++ standard has included since 2011 the std::thread objects. As our first multi-threaded program, we will use this:

#include <thread>
bool is_prime (int n)
{
  for (auto i = n - 1; i > 1; --i)
    if (n % i == 0)
      return false;
  return true;
}

int main ()
{
  std::vector<int> primes;
  int n = 0;

  auto worker = [&]() {
    for (auto i = 2; i < 20; ++i)
    {
      if (is_prime (i))
        primes.at (n++) = i;
    }
  };

  std::thread th (worker);
  th.join ();
  std::cout << "Primes: ";
  for (auto val : primes ) 
    std::cout << val << ' ';
}

What we have here: a very simple-minded is_prime function is called repeatedly by the worker function. It then puts the primes in a vector. The main function simply creates a thread that runs the worker function and waits until it finishes before printing the results. This is not very multi-threaded as we have only a single thread apart from the main thread, but we hope to improve.

Exception Issues

Surprisingly or not, the program doesn't work. It has a pretty obvious bug: the primes vector is empty and setting a non-existent element:

        primes.at (n++) = i;

triggers an std::out_of_range exception.

We could easily fix it by changing the code to:

        primes.push_back (i);

but let's see if we can do some exception handling. We will wrap the whole main function in a try...catch block and let it handle the out of range exception. Here is our new main function:

int main ()
{
  std::vector<int> primes;
  int n = 0;

  try {
    auto worker = [&]() {
      for (auto i = 2; i < 20; ++i)
      {
        if (is_prime (i))
          primes.at (n++) = i;
      }
    };

    std::thread th (worker);
    th.join ();
    std::cout << "Primes: ";
    for (auto val : primes)
      std::cout << val << ' ';
  }
  catch (std::exception& x) {
    std::cout << "Exception: " << x.what () << std::endl;
  }
}

The exception handler is not called and we end up with exactly the same error as before.

The explanation has to do with a very important rule about threads:

Each thread has its own stack.

When an exception occurs, the C++ runtime begins a process called stack unwinding in which it goes through the stack frame of each called function looking for an exception handler. Our exception handler, however, is on the stack of the main thread so it never gets called. Exceptions do not propagate between threads.

Before moving to something else, let's first fix our program. We will do it in two steps. First, we move the try... catch block in the thread function:

auto worker = [&]() {
    try {
      for (auto i = 2; i < 20; ++i)
      {
        if (is_prime (i))
          primes.at (n++) = i;
      }
    }
    catch (std::exception& x)
    {
      std::cout << "Exception: " << x.what () << std::endl;
    }
  };

This time, it will indeed catch the exception and the program output is:

Exception: invalid vector subscript
Primes:

As a final step, we now fix our little "bug". The finished program is:

//working version
int main ()
{
  std::vector<int> primes;

  auto worker = [&]() {
    try {
      for (auto i = 2; i < 20; ++i)
      {
        if (is_prime (i))
          primes.push_back(i);
      }
    }
    catch (std::exception& x)
    {
      std::cout << "Exception: " << x.what () << std::endl;
    }
  };

  std::thread th (worker);
  th.join ();
  std::cout << "Primes: ";
  for (auto val : primes)
    std::cout << val << ' ';
}

And the output is:

Primes: 2 3 5 7 11 13 17 19

Thread Encapsulation

So far, we've seen how to use std::thread objects to do the work but we still have to figure out how to pack together a thread and its private data in some kind of object.

Let's say that our primality checking thread needs to keep also a count of the number of primes it found. Also, we want the vector of results to be passed somehow to the thread.

A solution could be to derive an object prime_finder form std::thread. Something like this:

class prime_finder : public std::thread
{
public:
  prime_finder (std::vector<int>& v)
    : std::thread ([this] {this->worker (); })
    , count (0)
    , primes (v) {}

  int get_count () { return count; }
private:
  int count;
  inline
  void worker ()
  {
    try {
      for (auto i = 2; i < 20; ++i)
      {
        if (is_prime (i))
        {
          primes.push_back (i);
          count++;
        }
      }
    }
    catch (std::exception& x)
    {
      std::cout << "Exception: " << x.what () << std::endl;
    }
  };

  std::vector<int>& primes;
};

int main ()
{
  std::vector<int> results;
  prime_finder th (results);
  th.join ();
  std::cout << "Found " << th.get_count() << " primes: ";
  for (auto val : results)
    std::cout << val << ' ';
}

And guess what? It even works:

Found 8 primes: 2 3 5 7 11 13 17 19

But if you value your good night sleep, please, don't use code like that! Not unless you want to be woken up at any hour by irate coworkers or customers complaining your code just crashed and driving you mad that you cannot reproduce those errors.

To find out what's wrong with this code, let's see what happens when you instantiate the prime_finder object in the main function. The prime_finder constructor allocates space for the object, then invokes the constructors for any base objects, in this case, the std::thread constructor. From the C++ standard for std::thread constructor:

Creates new std::thread object and associates it with a thread of execution. The new thread of execution starts executing /*INVOKE*/(std::move(f_copy), std::move(args_copy)...)

The key here is that the new thread starts executing, potentially before the prime_finder constructor has finished setting up the object. It is now up to the OS scheduler to let the main thread finish the initialization of the prime_finder object (initialize count to 0 and set the address of primes vector) or switch immediately to the newly created thread. Things can run smoothly for a long time until the OS scheduler wakes up on the wrong side of the bed and our thread starts running too early and the whole program crashes.

To exemplify this problem, we can introduce an artificial delay in the prime_finder constructor:

class prime_finder : public std::thread
{
public:
  prime_finder (std::vector<int>& v)
    : std::thread ([this] {this->worker (); })
    , primes (v)
  {
    std::this_thread::sleep_for (std::chrono::milliseconds (10));
    count = 0;
  }
//...

Now the result is:

Found 0 primes: 2 3 5 7 11 13 17 19

The count variable was initialized to 0 long after the worker function has finished.

The important lesson here is:

DO NOT inherit from std::thread object.

A Better thread Class

I have to admit, I wasn't particularly impressed with the design of std::thread class. While the issues related to exception handling are somewhat unavoidable, the idea of running the new thread at construction time seems more like a blunder. Luckily, I didn't have to endure this problem having designed, long before the C++11, my own thread class as part of the mlib library.

Here are the relevant parts:

class thread : public syncbase
{
public:
  /// Thread state
  thread (std::function<int ()> func);
  virtual ~thread   ();

  virtual void start ();
//...
protected:
  thread (const char *name=0, bool inherit=false, 
          DWORD stack_size=0, PSECURITY_DESCRIPTOR sd=NULL);
  virtual void init ();
  virtual void run ();
//...
private:
  static unsigned int _stdcall entryProc (thread *ts);
//...
};

The base class, syncbase, is just a wrapper for handles of any Windows synchronization objects like semaphores, mutexes or events. The public constructor is very similar to std::thread constructor. It creates a thread object that will run the function. However, the new thread is not started yet. To start it, users have to call the start function. There is also a protected constructor that can be used by derived objects that need a finer control over aspects like thread stack size and security attributes.

On the inside, starting up a new thread is a relatively complicated process that is done in phases:

The constructor(s) call the Windows _beginthreadex function to create a new thread having entryProc as body. The new thread is created in a suspended state so it is guaranteed not to start running.
After the _beginthreadex function returns, the constructor resumes the newly created thread and waits for a created semaphore to become signaled.
The entryProc function can now run. It signals the created semaphore and waits for the started semaphore.
Because the created semaphore has been signaled, the constructor can now proceed and it returns. If the thread constructor was invoked as part of the constructor for a derived object, the rest of construction process can continue.

As I said before, to really start the new thread, users have to call the start function. This will signal the started semaphore and the entryProc function will invoke first a virtual init function that can do any initialization work and then the run function which is the actual run loop of the thread.

Note that these thread objects are not light-weight. Each object comes with two semaphores attached and there are two context switches to create them. They are safe and powerful but there is a price to pay for that.

Here is our program reworked to use the mlib::thread objects:

#include "mlib/thread.h"

class prime_finder : public mlib::thread
{
public:
  prime_finder (std::vector<int>& v)
    : primes (v) {
    std::this_thread::sleep_for (std::chrono::milliseconds (10));
    count = 0;
  }

  int get_count () { return count; }
private:
  int count;
  inline void run ()
  {
    for (auto i = 2; i < 20; ++i)
    {
      if (is_prime (i))
      {
        primes.push_back (i);
        count++;
      }
    }
  }

  std::vector<int>& primes;
};

int main ()
{
  std::vector<int> results;
  prime_finder th (results);
  try {
    th.start ();
    th.join ();
  }
  catch (std::exception& x)
  {
    std::cout << "Exception: " << x.what () << std::endl;
  }

  std::cout << "Found " << th.get_count () << " primes: ";
  for (auto val : results)
    std::cout << val << ' ';
}

Throwing Exceptions Across Thread Borders

A sharp-eyed reader will notice that I moved the exception handling code from the worker thread back to the main thread. This is possible because the thread::entryProc function has a try...catch block that catches all exceptions. The exceptions are stored in a std::exception_ptr object inside the thread. When the main thread calls the thread::wait function, the exception, if there was one, is re-thrown in the context of the main thread. To verify, we modify the run function to throw an exception:

// ...
  inline void run ()
  {
    int t = std::vector<int> ().at (1); //this triggers an out of range exception
    for (auto i = 2; i < 20; ++i)
    {
      if (is_prime (i))
      {
        primes.push_back (i);
        count++;
      }
    }
  }
//...

The output is:

Exception: invalid vector subscript
Found 0 primes:

You don't have to move the exception handling code in the main thread. You can still place try...catch blocks in the run function if that's more appropriate to the program's logic but, if you need one centralized error handling, mlib::thread can transfer the errors across thread boundaries. This transfer however is "delayed" - the exception will be re-thrown when the join function is invoked.

Parting Thoughts

Encapsulating threads in objects is not so simple but offers definite advantages. It allows you to differentiate between code and data that need to be accessed from other threads, that I call foreign, versus the internal data and functions, that I call own. As a general rule, own data and functions should be kept as private or protected members while foreign functions form the public interface. Constructors and destructors are inherently foreign and that's why they require special care. For other foreign functions, I favor a pattern where the caller transmits the request through some command semaphore or event and than waits for results:

class cool_thread : public mlib::thread
{
public:
//....
  stuff do_domething_cool () //foregin function
  {
    //send command to thread
    thread_critical_section.enter();
    command = WHAT_TO_DO;
    commad_semaphore.signal ();
    thread_critical_section.leave ();

    //wait for results
    results_semaphore.wait ();
    thread_critical_section.enter();
    stuff s = get_results ();
    thread_critical_section.leave ();
    return s;
  }
//...
private:
  stuff& get_results () {//...} //own function
}

Aside from the two issues I discussed, exception handling and construction dangers, there is third one I'd like to mention without providing any code to demonstrate it. Thread destruction can also be a dangerous time. As a rule, it should never be done by invoking the object's destructor because you cannot control the state the thread is in when it gets destructed. In the sample above, if the thread gets destructed while caller waits for results, the caller would deadlock.

History

17^th June, 2022 - Initial version