Understanding Windows Asynchronous Procedure Calls (APCs)

Bruno van Dooren

5.00/5 (10 votes)

2 Apr 2023MIT13 min read

11.5K

244

In this article, I will explain Asynchronous Procedure Calls (APCs), their uses and their pitfalls

Windows Asynchronous Procedure Calls (APCs) are an execution mechanism that is not widely understood, and not widely used. However, they can be a very useful tool when used correctly, and solve a number of problems you might encounter.

What is an APC?

As the name implies, an Asynchronous Procedure Call (APC) is a procedure call - in the form of a function pointer - which is scheduled to be executed on a specific thread. The scheduling itself can be done in any thread, so a thread can schedule work on another thread or even itself. It's basically a way of saying to a thread 'Hey, when you have a minute, I have some work I'd like you to do'.

There are two kinds of APC: User APCs and Special APCs. The big difference is control over when the APC is executed.

User APC

With a User APC, the target thread is in control because it gets to decide for itself when it allows the scheduled work to happen. It cannot be preempted by a User APC. A thread will start processing scheduled APCs when it enters an ‘alertable’ state. Specifically, this is when it is executing one of the following:

SleepEx
SignalObjectAndWait
WaitForSingleObjectEx
WaitForMultipleObjectsEx
MsgWaitForMultipleObjectsEx

Each of those APIs indicate a moment when the calling thread decides it’s going to do ‘nothing’ for a while, and now’s a good time to process a backlog of work if there is any. The first four options are obvious. The fifth, MsgWaitForMultipleObjectsEx, was created specifically for Window based applications, which typically spend most of their time waiting until they receive a message. Using MsgWaitForMultipleObjectsEx was a way for Microsoft to give developers an easy way to use that time for processing scheduled work without having to implement additional threading complexity.

Note even when using the above functions, the calling thread is always in control because it can specify whether the wait or sleep is alertable or not.

Special APC

Special APCs on the other hand execute in the opposite manner: the target thread has no say in when they are executed. The only certainty is that they are NOT executed when

a system call is in progress or
a non-alertable wait is being performed.

In all other cases, the APC is executed when the system decides the conditions for executing them are valid, regardless of what the target thread happens to be doing at the time. Up to and including Windows 10, special APCs were available only in kernel mode. There, they are commonly used for IO completion in device drivers that don't really depend on whatever is going on in the thread they just happen to interrupt.

I am not sure why Microsoft allowed the use of special APCs in user mode from Windows 11 onwards but if I had to make an educated guess, I'd say it would be related to User Mode Device drivers which -as the name implies- do not run in kernel mode yet use similar design principles. In terms of application level development, their added value is marginal at best, and a great way to shoot yourself in the foot in impossible to reproduce ways. More on this later.

Why use APCs?

There are already several mechanisms available to provide parallel execution and scheduling. Why would you want to use and APC?

There are many reasons why one could want to schedule APCs. The most common seem to be:

There is a long delay between starting something and getting a result. Using an APC to handle the result is a convenient design pattern.
Due to historical design reasons and the way Window Messaging works, a user interface element must only be updated form within the user interface thread for that window. Any asynchronous process that was performing an operation and wants to report back to the user interface can schedule an APC on that thread, making sure that the user interface update is done inside the correct thread.
Using the APC mechanism as a scheduling tool. While we can argue whether it’s appropriate or not, the fact is that the APC mechanism uses a queue to schedule the work that needs to be done. Any sequence of separate actions that has to be performed, can be broken down into a series of procedure calls that is executed in order.

Scheduling an APC

Normal User APCs are scheduled using the QueueUserAPC function.

C++

DWORD QueueUserAPC(
    PAPCFUNC pfnAPC,
    HANDLE hThread,
    ULONG_PTR dwData );

As you can see, this function has many similarities to functions like CreateThread which take a function pointer to execute, and a data pointer to pass to the function. Only instead of creating a new thread to execute that function, you specify an existing thread where that function is executed.

APC Console App Example

In this section, I explain the two ways in which you can use an APC in a console application. We are going to user APCs to send work items to a worker thread, and the worker thread will post results back to the main thread via APC.

Preliminaries

The following pieces of code form the basis of our test application.

We have different kind of tasks that we want to dispatch, so it makes sense that different tasks have different datasets to work with. Since we can only pass one parameter to an APC, we put all task related data in a struct. If it is important that the task reports back, the caller needs to supply their own thread handle as well.

C++

struct Task1Data
{
    HANDLE hCaller;
    DWORD Value;
};

struct Task2Data
{
    HANDLE hCaller;
    FLOAT Value;
};

struct ResponseData
{
    HANDLE hTaskThread;
    string Task;
};

Obviously, each task needs its own function body, where it receives its task data and does something useful with it.

As you can say, PerformTask1 just does its job and quits without reporting back. PerformTask2 otoh does its job and then queues a response APC to the thread from which it was dispatched.

C++

void ReportBack(void* context)
{
    cout << "    Reporting back" << endl;
}

void PerformTask1(void* context)
{
    Task1Data* data = (Task1Data*)context;
    cout << "Thread " << GetCurrentThreadId() <<
        " performing Task 1 with value " << data->Value <<
        " for thread " << GetThreadId(data->hCaller) << endl;
    delete data;
}

void PerformTask2(void* context)
{
    Task2Data* data = (Task2Data*)context;
    cout << "Thread " << GetCurrentThreadId() <<
        " performing Task 2 with value " << fixed << data->Value <<
        " for thread " << GetThreadId(data->hCaller) << endl;
    QueueUserAPC(
        (PAPCFUNC)&ReportBack,
        data->hCaller,
        (ULONG_PTR)NULL);
    delete data;
}

Incidentally, as I explained here, if a thread wants to pass its own thread handle to another thread, it needs to duplicate its thread token.

C++

HANDLE hMainThread = NULL;
if (!DuplicateHandle(
    GetCurrentProcess(),
    GetCurrentThread(),
    GetCurrentProcess(),
    &hMainThread,
    0,
    FALSE,
    DUPLICATE_SAME_ACCESS)) {
    cout << "Error " << GetLastError() << " cannot duplicate main handle." << endl;
    return GetLastError();
}

Getting User Input

Since console applications typically get input from the user via the keyboard, there are two ways to do this: blocking and non blocking. With a blocking input, the application gets stuck in a system call until the user has entered something. With non blocking input, the application basically checks if there is fresh input or not, and returns anyway.

This difference is important because if we use blocking input, it is impossible to process APCs. The application thread isn't alertable. Otoh, if we don't use blocking input methods, APCs can be handled while the application is waiting for user input.

Below is an example of what such an input function may look like. Not that this is not the only way to implement such a loop, or even the best way. But for the purpose of this example, it does what it needs to do: wait for input, which is terminated by a return or enter. And if a non-blocking method is chosen, the input loop times itself on a 100 ms loop with the ability to be alerted for APC processing.

C++

string GetChoice(bool blocking)
{
    string buffer = "";
    if (blocking) {
        cin >> buffer;
        return buffer;
    }
    else {
        while (1) {
            SleepEx(100, TRUE);
            if (_kbhit()) {
                char c = _getche();
                if (c == '\n' || c == '\r') {
                    cout << "\n";
                    return buffer;
                }
                else {
                    buffer += c;
                }
            }
        }
    }
    return "";
}

As I mentioned already, for a live environment, this input method is not robust enough. In a live environment, you'd need to capture the 'escape' key, as well as allow for interruption via Ctrl + Break, etc. A full exploration of input paradigms is beyond the scope of this article.

Implementing the APCs

Now we finally get to the point where we can do something interesting. Before doing anything else, we create the worker thread to which we dispatch tasks, and the win32 event which we use to signal when it's time for the worker thread to shut down. This is done when the user enters 'q' for quitting.

C++

HANDLE shutdown = CreateEvent(NULL, TRUE, FALSE, NULL);
HANDLE workerThread = CreateThread(NULL, 0, WorkerThread, shutdown, 0, NULL);

cout << "Make a choice:" << endl;
cout << "==============" << endl;
cout << "q: quit" << endl;
cout << "1: Initiate task 1" << endl;
cout << "2: Initiate task 2" << endl;
if(blockingInput)
    cout << "p: Process queue-ed maint thread APCs" << endl;
cout << "Choice: ";

DWORD dwValue = 0;
FLOAT fValue = 0;
do
{
    string  choice = GetChoice(blockingInput);

    if (choice == "q") {
        SetEvent(shutdown);
        WaitForSingleObject(workerThread, INFINITE);
        break;
    }
    else if (choice == "1") {
        auto data = new Task1Data;
        data->hCaller = hMainThread;
        data->Value = dwValue++;
        QueueUserAPC((PAPCFUNC) & PerformTask1, workerThread, (ULONG_PTR)data);
    }
    else if (choice == "2") {
        auto data = new Task2Data;
        data->hCaller = hMainThread;
        data->Value = fValue++;
        QueueUserAPC((PAPCFUNC) &PerformTask2, workerThread, (ULONG_PTR)data);
    }
    else if (choice == "p" && blockingInput) {
        SleepEx(0, TRUE);
    }
} while (1);

There are only two task related commands. Each command results in the creation of a task data structure, which is dispatched to the worker thread APC queue together with the task function pointer. As you can see, that's easy enough.

But why the 'P' for processing APCs that were scheduled as response? Well, remember that we had to choose between blocking and non-blocking input? If we use blocking input, there is never going to be an opportunity to process those APCs unless we explicitly create an opportunity to process them by becoming alertable at some point in time.

So how are the worker APCs handled? That part is simple:

C++

DWORD WorkerThread(void* context)
{
    HANDLE shutdown = (HANDLE) context;
    DWORD retVal = 0;

    while( retVal = WaitForSingleObjectEx
         (shutdown, INFINITE, TRUE) == WAIT_IO_COMPLETION)
        ;
    return retVal;
}

Like any other thread, it needs to make itself alertable. Since its only purpose is to handle APCs, it can be alertable indefinitely while waiting for the shutdown event. It's important to keep in mind that WaitForSingleObjectEx will return when it is alerted and APCs were processed.

It may seem annoying that we need to re-enter the wait without anything having changed. However, this is the right thing to do because the execution of an APC may have altered the situation in the worker thread to the point where it needs to do something else before continuing to wait. For example, instead of setting the shutdown event in the main thread, could have implemented the shutdown functionality as a shutdown APC. And if the worker thread processed that APC, it could shut itself down.

There is no '1 correct way' to work with APCs. APCs are technology. 'How' you use them is strategy, policy and design.

Now looking at the worker thread, it looks suspiciously empty. Where is the actual work done? The answer is simple: invisibly, in the background. More to the point: when the thread is in an alertable state and an APC is queued, Windows will suspend the normal thread function (in this case, the 'WorkerThread' function). It will then pull the first APC from the queue, and execute the APC function pointer in the context of the thread just as if it was the regular thread function, with the data parameter as function argument.

After completion, if will pull the next one from the queue and process that. It will continue to do so until all APCs are processed. When that point is reached, Windows will reinstate the original thread function and let it deal with the fact that it was alerted, in whatever manner it deems fit. In our case, this is simply to resume the wait.

Of course, you can put breakpoints in the APC functions. However, you cannot deduce what happens in the worker thread by looking at the code of the worker thread. Whatever is executed is the result of what other threads tell it to do. This means that if your application dispatches APCs into a worker thread from different locations, it is up to you to ensure that the internal logic of your code can deal with APCs being executed in whatever order they happen to arrive in.

Another thing that deserves emphasis is that if APCs are scheduled before the thread itself starts executing, the APCs are executed before the thread function itself starts.

Multithreading Considerations

We've already mentioned that APCs are just another tool in the multithreading arsenal of technologies. However, they're a tool that requires significantly more care than regular threading primitives.

The key here is that an APC does not execute in parallel with its target thread. It interrupts the target thread. And this is important because things like mutexes can be acquired recursively inside a thread that is already holding them. This means that a thread cannot protect resources with normal means. Which is a good thing, in case you were wondering. Otherwise, an APC trying to acquire a mutex that was already held by its target thread would deadlock the thread forever.

In fact, virtually the only thing a thread can really do to make sure its data isn't corrupted by improper parallel access, is to control when an APC can execute. As we saw earlier, an APC can only execute if a thread declares itself alertable. The thread is fully in control, and can determine when its data is safe for the APC to touch. Note that if the APC tries to touch data that is shared with other threads, it may still need to use a mutex to protect access from other threads.

Special APCs

This is also why I mentioned earlier that for application development, the Special APC is all but useless, and extremely dangerous to use. They will interrupt the thread without regard for what the thread is doing or whether it is in the middle of something or not. This means the normal protections don't work and it's very easy for the thread state to be corrupted.

It gets worse. Whereas thread executes normal User APCs one by one, a Special APC will execute whenever it can, even if another APC was already executing at that moment. Special APCs can interrupt a Special APC that itself interrupted a User APC. See how messy that gets? How utterly unpredictable and hard to analyse?

But wait, it can still get even worse than that. With User APCs, the programmer is firmly in control of WHEN an APC can execute, and Windows will only process 1 APC at a time in that particular thread. However, when you use locking primitive like mutexes, your application or your User APC can still get preempted by a special APC while it is holding the lock. And the Special APC can do something that relies on that lock being acquired in another thread if you have a complex application, leading to deadlocks or problems that are insanely hard to reproduce.

The only time they are safe is if -by design- you arrange things so that when a Special APC is executed, there is nothing the thread could be doing that could conflict with the APC. By definition, such things are extremely limited in scope and in virtually all cases that concern application development, can be handled in a much safer manner by User APCs. Outside of things like a User Mode device driver framework which handle very limited, very specific IO packets that have nothing to do with the thread they happen to execute in, I don't see any use case where the added complexity and maintenance nightmare justify their use. If you know of one, please post in the comments below.

Running the Test Application

The test application demonstrates the principles I've explained. The first thing it does is ask whether you want to use blocking or non blocking input.

If we choose blocking input, then Task 2 will queue a response APC with every execution, however it doesn't get processed until we enter 'p' which will interrupt the program once, at which point all queues APCs will execute.

If we choose non-blocking input, then the response APC will be executed while the input loop is waiting for / collecting data.

In both cases, an action must be made to ensure that the APCs get executed. One option is to place the user in charge explicitly. Another is to put the application in charge, which in this case means inside the input loop.

Points of Interest

APCs are a convenient mechanism for parallelization and task offloading. I mentioned earlier that there is no '1 right approach'. It all boils down to how you design your application.

The code is released under the MIT license so have fun with it.

History

30^th March, 2023: 1^st version

License

This article, along with any associated source code and files, is licensed under The MIT License