Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

C++ WinAPI Wrapper Object using thunks (x32 and x64)

0.00/5 (No votes)
6 Dec 2016 2  
Using thunk technique to add this pointer as fifth parameter to WndProc call for x32 and x64

Introduction

This article presents an overview of a technique known as "thunking" as a means to instantiate the WinAPI in a C++ object. While there are various methodologies for such an implementation, this article describes a method of intercepting the WndProc call and appending the this pointer to a fifth parameter on the function call. It uses one thunk and only one function call and has code for both x32 and x64.

Background

The WinAPI implementation was introduced before C++ and OOP became popular. Attempts such as ATL, have been made to make it more class and object orientated. The main stumbling block is that the message handling procedure (typically known as WndProc) is not called as part of the application but is called from outside by Windows itself. This requires that the function be global and in the case of a C++ member function, it must be declared static. The result is that when the application enters WndProc, it does not have a pointer to the particular instance of the object to which to call any additional handler functions.

This means that any C++ OOP approach must solve for determining from a static member function which object method the message processing should be passed to.

Some options include:

  1. Only design for a single window instance. The message processor can be global or namespace-scoped.
  2. One can use the extra memory bits provided by cbClsExtra or cbWndExtra to store a pointer to the correct object.
  3. Add a property to the window which is a pointer to the object and use GetProp to retrieve it.
  4. Maintain a look-up table that references the pointer to the object.
  5. Use a method known as a “thunk”.

Each has upsides and downsides.

  1. You are limited to a single window and the code cannot be reused. This may be fine for simple applications but going to the effort of encapsulation within an object, you might as well forgo it and stick with the standard template.
  2. The method is “slow” and requires overhead to make the call to get the pointer from the extra bits each time a message comes through. In addition, it reduces reusability of the code as it hinges on these values not being overwritten or used for other purposes through the life of the window. On the other hand, it is a straightforward and easy implementation.
  3. Slower than number 2 and introduces similar overhead, but you do eliminate the potential of data being overwritten (though you need to ensure the property has a unique name so it will not conflict with any other added properties).
  4. Here, we run into performance and overhead issues as the look-up table grows, and this lookup needs to happen each time the message processor function is called. It does allow for the function to be a private static member.
  5. This is somewhat tricky to implement, but provides for low overhead, better performance over the other methods and allows for enhanced flexibility and suitable to any OOP design style.

The truth is that a good deal of applications really don't need anything fancy and can get away with using a more conventional approach. However, if you want to build an extensible framework with low overhead, however, then method 5 provides the best option and this article will present an overview of how to actually approach such a design.

Using the Code

A thunk is a piece of executing code located in memory. It has the potential to change the executing code at the moment of execution. The idea is to place a small piece of code into memory and then have it execute and modify the running code elsewhere. For our purposes, we want to capture the executable address of the message processing member function and substitute it with the originally registered function and encode the object’s address with the function so that it can properly call the correct non-static member function next in the message processing queue.

First, let’s create our template for this project. We’ll need a main file that will contain the wWinMain function.

// appmain.cpp : Defines the entry point for the application.
//
 
#include "stdafx.h"
#include "AppWin.h"
 
 
int APIENTRY wWinMain(_In_ HINSTANCE hInstance, _In_opt_ HINSTANCE hPrevInstance, 
_In_ LPWSTR    lpCmdLine, _In_ int nCmdShow)
{
    UNREFERENCED_PARAMETER(hPrevInstance);
    UNREFERENCED_PARAMETER(lpCmdLine);
}

Now our AppWin.h and AppWin.cpp files and create an empty class structure.

// AppWin.h : header file for the AppWinClass
//
 
#pragma once
 
#include "resource.h"
 
class AppWinClass {
public:
    AppWinClass(){}
    ~AppWinClass(){}
 
private:
};

// AppWin.cpp : implementation of AppWinClass
//
 
#include "stdafx.h"
#include "AppWin.h"
 
AppWinClass::AppWinClass() {}
AppWinClass::~AppWinClass() {}

We’ll need to setup our object with all of the necessary elements that are required for the window to be created. The first element is registering the WNDCLASSEX structure. Some of the elements in WNDCLASSEX should be allowed to be changed by the code instantiating the object, but some fields we want to reserve to the object to control.

An option here is to define a “struct” with the elements we are allowing a user to define themselves and then pass that to a function to copy into the WNDCLASSEX structure that will be registered or to just pass the elements as part of the function call. If we use a “struct”, the data elements could be reused elsewhere possibly. Of course, a struct takes up memory and if we are using the elements only once, that’s not very efficient. One could simply pass the elements as part of the function call and reduce the scope to just that function and being more efficient. But we would need to pass at least 20 parameters and then perform checks for each on their value.

Here, we will declare default values within our creation function and then declare a “struct” outside of our class where if the user wants to adjust the defaults, they can and they can manage the lifecycle of that structure. The user just declares to the function whether they will pass the struct and update default values or just go with the defaults. So, we declare the following function:

int AppWinClass::Create(HINSTANCE hInstance, int nCmdShow, AppWinStruct* varStruct)

hInstance is used throughout the creation process and nCmdShow is passed as part of the ShowWindow call.

So we begin our function by checking if we received our AppWinStruct and if not, we load our WNCLASSEX structure with defaults, otherwise we accept what AppWinStruct provided.

int AppWinClass::Create(HINSTANCE hInstance, int nCmdShow = NULL, AppWinStruct* varStruct = nullptr)
{  
    WNDCLASSEX wcex; //initialize our WNDCLASSEX
    wcex.cbSize      = sizeof(WNDCLASSEX);
    wcex.hInstance      = hInstance;
    wcex.lpfnWndProc = ;
    if (!varStruct) //default values
    {
        varStruct = new AppWinStruct;
        wcex.style            = CS_HREDRAW | CS_VREDRAW;
        wcex.cbClsExtra    = 0;
        wcex.cbWndExtra    = 0;
        wcex.hIcon            = LoadIcon(nullptr, IDI_APPLICATION);
        wcex.hCursor        = LoadCursor(nullptr, IDC_ARROW);
        wcex.hbrBackground = (HBRUSH)(COLOR_WINDOW + 1);
        wcex.lpszMenuName  = nullptr;
        wcex.lpszClassName = L"Window";
        wcex.hIconSm        = NULL;        
    }
    else
    { //user defined
        wcex.style            = varStruct->style;
        wcex.cbClsExtra    = varStruct->cbClsExtra;
        wcex.cbWndExtra    = varStruct->cbWndExtra;
        wcex.hIcon            = varStruct->hIcon;
        wcex.hCursor        = varStruct->hCursor;
        wcex.hbrBackground = varStruct->hbrBackground;
        wcex.lpszMenuName  = varStruct->lpszMenuName;
        wcex.lpszClassName = varStruct->lpszClassName;
        wcex.hIconSm        = varStruct->hIconSm;
    }

Note that we are missing our declaration for wcex.lpfnWndProc. This variable will register our message processing function. Because of the setup, this function must be static and hence will not be able to call specific functions of the object to handle message processing for specific messages. A typical WNDPROC function header looks like this:

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)

Eventually, we will use a thunk to essentially overload the function call with a 5th parameter we will insert that will be a pointer to our objext's this. Before we do that, we'll declare our WndProc function. This is just a standard WndProc function providing handling of PAINT and DESTROY message -- just enough to get a window up.

// AppWin.h
class AppWinClass {
.
.
.
private:

    static LRESULT CALLBACK WndProc(HWND hWnd, UINT message, 
                    WPARAM wParam, LPARAM lParam, DWORD_PTR pThis);

//AppWin.cpp
LRESULT CALLBACK AppWinClass::WndProc(HWND hWnd, UINT message, 
                    WPARAM wParam, LPARAM lParam, DWORD_PTR pThis)
{
    switch (message)
    {
    case WM_PAINT:
    {
        PAINTSTRUCT ps;
        HDC hdc = BeginPaint(hWnd, &ps);
        // TODO: Add any drawing code that uses hdc here...
        EndPaint(hWnd, &ps);
    }
    break;
    case WM_DESTROY:
        PostQuitMessage(0);
        break;
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    return 0;
}

Here, we have declared a fifth parameter that will include our this pointer. Windows will call it with the four standard parameters passed. So we need to interrupt the function call and place on the call stack a 5th parameter that will be a pointer to our class object. This is where the thunk comes in.

Again, a thunk is a bit of executable code on the heap. Instead of calling the window message procedure, we will call the thunk as if it were a function. The function variables are pushed onto the stack prior to the thunk call and all the thunk needs to do is add one more variable to the stack and then jump to the original intended function.

A couple of notes. Because of DEP (Data Execution Prevention), we must allocate some heap that is marked executable for this process. Otherwise, DEP will prevent the code from executing and throw an exception. We use HeapCreate with the HEAP_CREATE_ENABLE_EXECUTE bit set. HeapCreate will at a minimum reserve a 4k page of memory and our thunk is very small. Since we don’t want to create a new page for every new thunk instance of every object, we will declare a variable to hold the heap handle so the heap can be reused.

// AppWin.h
class AppWinClass {
.
.
.
private:
    static HANDLE eheapaddr;
    static int objInstances;

We'll use our class constructor to create the heap.

//AppWin.cpp
HANDLE AppWinClass::eheapaddr = NULL;
int    AppWinClass::objInstances = 0;

AppWinClass::AppWinClass()
{
    objInstances = ++objInstances;
    if (!eheapaddr)
    {
        try {
            eheapaddr = HeapCreate(HEAP_CREATE_ENABLE_EXECUTE | HEAP_GENERATE_EXCEPTIONS, 0, 0);
        }
        catch (...) {
            throw;
        }
    }
    try {
        thunk = new(eheapaddr) winThunk;
    }
    catch (...) {
        throw;
    }
}

We initialize the static eheapaddr (executable heap address) and objInstances (our marker to count the number of instances of our object) to 0. In the constructor, we first increment objInstances. We do not want to destroy our heap until all other instances of our object are gone. Now, we check if eheapaddr has already been initialized and if not, we give it the value of the handle returned by HeapCreate. We call HeapCreate and specify that we want to enable execution of code on this heap and we want to generate exceptions if this allocation fails. We then wrap this in a try catch statement that will rethrow the exception given by HeapCreate and allow the caller of the object to figure it out.

We’ll also allocate our thunk on the heap. We’ll also want to override the new operator for our thunk class so that it can be allocated onto our heap and we can pass the handle from HeapCreate. We’ll also put this into a try catch statement in case the alloc fails (because we set HEAP_GENERATE_EXCEPTIONS for HeapCreate, HeapAlloc will also generate exceptions).

We will destroy this heap when our object is deleted so we will update the destructor with the following:

//AppWin.cpp
AppWinClass::~AppWinClass() {
    if (objInstances == 1)
    {
        HeapDestroy(eheapaddr);
        eheapaddr = NULL;
        objInstances = 0;
    }
    else
    {
        objInstances = --objInstances;
        delete thunk;
    }
}

Simply check if we are the last object instantiation and if so, destroy the heap and reset eheapaddr to NULL. Otherwise decrement objInstances. Note: eheapaddr and obInstances do not need to be set to zero as our whole object is about to go away. We do need to call the delete operator on our thunk which ensures it frees itself from our heap.

A note here: InterlockedInstances() could be used to provide a better multi-threaded approach instead of incrementing and decrementing a static counter.

Now we can declare our thunk class. Because x32 and x64 are different in how they handle the stack and function calls, we need to wrap the declaration in #if defined statements. We use _M_IX86 for x32 bit apps and _M_AMD64 for x64 bit apps.

The idea is we create a structure and place variables in a specific order at the top. When we make a call to this “function”, we are instead calling into the top of the memory of the structure and will begin to execute the code stored in the variables at the top.

We use the #pragma pack(push,#) declaration to align the bytes correctly for execution, otherwise the compiler may pad the variables (and does so anyway with the x64 set).

For x32, we require 7 variables. We then assign them the hexadecimal equivalent of our x86 assembly code. The assembly looks like the following:

push dword ptr [esp] ;push return address 
mov dword ptr [esp+0x4], pThis ;esp+0x4 is the location of the first element in the function header
                               ;and pThis is the value of the pointer to our object’s “this”
jmp WndProc ;where WndProc is the message processing function

Because we do not know the value of pThis or WndProc before the program runs, we need to collect these at runtime. So we create a function in the structure to initialize these variables and we will pass both the location of the message processing function and pThis.

We also need to flush the instruction cache to ensure our new code is available and the instruction cache will not try to execute old code. If the flush succeeds (returns 0), we return true else we return false and let the program know we had a problem.

A few notes on what is going on for our 32-bit code. Following calling conventions, we need to preserve our stack frame for the calling function (remember it is calling a function it thinks it has 4 variables). The calling function return address is at the bottom of the stack. So we deference esp (which is pointing to our return address) and push (push [esp]) decrementing esp, adding a new "layer" holding the return address and hence make room for our fifth variable. Now, we move our object pointer value +4 bytes onto the stack (overwriting the original location of the return value) where it will become the first value in our function call (conceptually we pushed the function parameters to the right). In Init m_mov is given the hexadecimal equivalent of mov dword ptr [esp+0x4]. We then assign the value of pThis to m_this to complete the mov instruction. m_jmp gets the hexadecimal equivalent of the jmp opcode. Now we do a little calculation to find the address we need to jump to and assign it to m_relproc (relative position to our procedure).

We also need to override new and delete for our struct to properly allocate the object on our executable heap.

Also note that Intel uses "little endian" format so the instruction bytes must be reversed (high order byte is first) [applies to x64 as well].

// AppWin.h
#if defined(_M_IX86)
#pragma pack(push,1)
struct winThunk
{
    unsigned short m_push1;    //push dword ptr [esp] ;push return address
    unsigned short m_push2;
    unsigned short m_mov1;     //mov dword ptr [esp+0x4], pThis ;set our new parameter by replacing old return address
    unsigned char  m_mov2;     //(esp+0x4 is first parameter)
    unsigned long  m_this;     //ptr to our object
    unsigned char  m_jmp;      //jmp WndProc
    unsigned long  m_relproc;  //relative jmp
    static HANDLE  eheapaddr;  //heap address this thunk will be initialized to
    bool Init(void* pThis, DWORD_PTR proc)
    {
        m_push1 = 0x34ff; //ff 34 24 push DWORD PTR [esp]
        m_push2 = 0xc724;
        m_mov1  = 0x2444; // c7 44 24 04 mov dword ptr [esp+0x4],
        m_mov2  = 0x04;
        m_this  = PtrToUlong(pThis);
        m_jmp   = 0xe9;  //jmp
        //calculate relative address of proc to jump to
        m_relproc = unsigned long((INT_PTR)proc - ((INT_PTR)this + sizeof(winThunk)));
        // write block from data cache and flush from instruction cache
        if (FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk)))
        { //succeeded
            return true;
        }
        else
        {//error
            return false;
        }
    }
    //some thunks will dynamically allocate the memory for the code
    WNDPROC GetThunkAddress()
    {
        return (WNDPROC)this;
    }
    void* operator new(size_t, HANDLE heapaddr)
    {
        eheapaddr = heapaddr; //since we can't pass a value with delete operator, we need to store
                              //our heap address so we can use it later when we need to free this thunk
        return HeapAlloc(heapaddr, 0, sizeof(winThunk));
    }
    void operator delete(void* pThunk)
    {
        HeapFree(eheapaddr, 0, pThunk);
    }
};
#pragma pack(pop)

The x64 version follows the same principles but we need to account for some differences in how x64 handles the stack and to compensate for some alignment issues. The Windows x64 ABI uses the following paradigm for pushing variables for function calls (note it doesn't do push or pop - it is similar to a fastcall). The first parameter is moved to rcx. The second parameter is moved to rdx. The third parameter is moved to r8. The fourth parameter is moved to r9. The following parameters are pushed to the stack but there is a trick. The ABI reserves space on the stack for storage of these 4 parameters (referred to as shadow space). Hence there are four 8 byte spaces reserved at the top of the stack. Also at the top of the stack is the return address. So the fifth parameter is placed on the stack at position rsp+28.

--- Bottom of stack ---    RSP + size     (higher addresses)
arg N
arg N - 1
arg N - 2
...
arg 6
arg 5                      [rsp+28h]
(shadow space for arg 4)   [rsp+20h]
(shadow space for arg 3)   [rsp+18h]
(shadow space for arg 2)   [rsp+10h]
(shadow space for arg 1)   [rsp+8h]
(return address)           [rsp]
---- Top of stack -----    RSP            (lower addresses)

For non-static function calls, it does the following for the first 5 parameters. It pushes this to rcx, then to edx (1st param), then to r8 (2nd param), then to r9 (3rd param), then to rsp+0x28 (4th param), then rsp+0x30 (5th param). For non-static 1st parameter to rcx, then to rdx (2nd param), then to r8 (3rd param), then to r9 (4th param), then to rsp+0x28 (5th parameter). So we need to place our value at rsp+0x28.

We encounter a problem in that one of the instruction sets (mov [esp+28], rax) is a 5 byte instruction and the compiler tries to align everything on a 1,2,4,8,16 byte boundary. So we need to do some manual alignment. This requires adding a no operation (nop) [90] command. Otherwise, the same principles are applied. Note because the addresses for pThis and proc occupy 64 bit variables, we need to use the movabs operand which makes use of rax.

#elif defined(_M_AMD64)
#pragma pack(push,2)
struct winThunk
{
    unsigned short     RaxMov;  //movabs rax, pThis
    unsigned long long RaxImm;
    unsigned long      RspMov;  //mov [rsp+28], rax
    unsigned short     RspMov1;
    unsigned short     Rax2Mov; //movabs rax, proc
    unsigned long long ProcImm;
    unsigned short     RaxJmp;  //jmp rax
    static HANDLE      eheapaddr; //heap address this thunk will be initialized too
    bool Init(void *pThis, DWORD_PTR proc)
    {
          RaxMov  = 0xb848;                    //movabs rax (48 B8), pThis
          RaxImm  = (unsigned long long)pThis; //
          RspMov  = 0x24448948;                //mov qword ptr [rsp+28h], rax (48 89 44 24 28)
          RspMov1 = 0x9028;                    //to properly byte align the instruction we add a nop (no operation) (90)
          Rax2Mov = 0xb848;                    //movabs rax (48 B8), proc
          ProcImm = (unsigned long long)proc;
          RaxJmp = 0xe0ff;                     //jmp rax (FF EO)
        if (FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk)))
        { //error
            return FALSE;
        }
        else
        {//succeeded
            return TRUE;
        }
    }
    //some thunks will dynamically allocate the memory for the code
    WNDPROC GetThunkAddress()
    {
        return (WNDPROC)this;
    }
    void* operator new(size_t, HANDLE heapaddr)
    {
        eheapaddr = heapaddr; //since we can't pass a value with delete operator we need to store
                              //our heap address so we can use it later when we need to free this thunk
        return HeapAlloc(heapaddr, 0, sizeof(winThunk));
    }
    void operator delete(void* pThunk)
    {
        HeapFree(eheapaddr, 0, pThunk);
    }
};
#pragma pack(pop)
#endif

We now have our message handler and our thunk. We can now assign the value to lpfnWndProc.

Caution - We use two different calling parameters, one for 32-bit and one for 64-bit. In our 32-bit code, our pointer is the first parameter. In our 64-bit code, it is the fifth parameter. We need to account for this by wrapping our code with some compiler instructions.

AppWin.h
#if defined(_M_IX86)
    static LRESULT CALLBACK WndProc(DWORD_PTR, HWND, UINT, WPARAM, LPARAM);
#elif defined(_M_AMD64)
    static LRESULT CALLBACK WndProc(HWND, UINT, WPARAM, LPARAM, DWORD_PTR);
#endif
AppWin.cpp
#if defined(_M_IX86)
LRESULT CALLBACK AppWinClass::WndProc(DWORD_PTR This, HWND hWnd, 
                                      UINT message, WPARAM wParam, LPARAM lParam)
#elif defined(_M_AMD64)
LRESULT CALLBACK AppWinClass::WndProc(HWND hWnd, UINT message, 
                                      WPARAM wParam, LPARAM lParam, DWORD_PTR This)
#endif

But lpfnWndProc will refer to our thunk and not our message processor function. So we initialize our thunk with the proper values.

//AppWin.cpp
int AppWinClass::Create(HINSTANCE hInstance, int nCmdShow, AppWinStruct* varStruct)
{
    thunk->Init(this, (DWORD_PTR)WndProc); //init our thunk

Some thunks may dynamically allocate their memory so we use the GetThunkAddress function which simply returns the thunk's sure this pointer. We cast the call with WNDPROC as that is what our windows class is expecting.

Now we register our WNDCLASSEX structure. We’ll declare a public variable classatom to hold the return of RegisterClassEx for future use if wanted. And we call RegisterClassEx.

Now we call CreateWindowEx pass along the variables. If the WS_VISIBLE bit was set, then we do not need to call ShowWindow so we check for that. We do an UpdateWindow and then enter our message loop. And we are done.

*One additional note. I use DWORD_PTR This in my WndProc declaration. This is in my opinion a better aid to help demonstrate the principle. However, to avoid a useless conversion, declare it as AppWinClass This.

AppWin.h
// AppWin.h : header file for the AppWinClass
//

#pragma once

#include "resource.h"

#if defined(_M_IX86)
#pragma pack(push,1)
struct winThunk
{
    unsigned short m_push1;    //push dword ptr [esp] ;push return address
    unsigned short m_push2;
    unsigned short m_mov1;     //mov dword ptr [esp+0x4], pThis ;set our new parameter by replacing old return address
    unsigned char  m_mov2;     //(esp+0x4 is first parameter)
    unsigned long  m_this;     //ptr to our object
    unsigned char  m_jmp;      //jmp WndProc
    unsigned long  m_relproc;  //relative jmp
    static HANDLE  eheapaddr;  //heap address this thunk will be initialized to
    bool Init(void* pThis, DWORD_PTR proc)
    {
        m_push1 = 0x34ff; //ff 34 24 push DWORD PTR [esp]
        m_push2 = 0xc724;
        m_mov1  = 0x2444; // c7 44 24 04 mov dword ptr [esp+0x4],
        m_mov2  = 0x04;
        m_this  = PtrToUlong(pThis);
        m_jmp   = 0xe9;  //jmp
        //calculate relative address of proc to jump to
        m_relproc = unsigned long((INT_PTR)proc - ((INT_PTR)this + sizeof(winThunk)));
        // write block from data cache and flush from instruction cache
        if (FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk)))
        { //succeeded
            return TRUE; 
        }
        else { //error
             return FALSE;
        }
     }
     //some thunks will dynamically allocate the memory for the code
     WNDPROC GetThunkAddress() 
     { 
        return (WNDPROC)this;
     }
     void* operator new(size_t, HANDLE heapaddr)
     {
        eheapaddr = heapaddr;
        //since we can't pass a value with delete operator we need to store
        //our heap address so we can use it later when we need to free this thunk
        return HeapAlloc(heapaddr, 0, sizeof(winThunk));
      }
      void operator delete(void* pThunk)
      {
        HeapFree(eheapaddr, 0, pThunk);
      }
 };
#pragma pack(pop)
#elif defined(_M_AMD64)
#pragma pack(push,2)
struct winThunk
{
    unsigned short     RaxMov;  //movabs rax, pThis
    unsigned long long RaxImm;
    unsigned long      RspMov;  //mov [rsp+28], rax
    unsigned short     RspMov1;
    unsigned short     Rax2Mov; //movabs rax, proc
    unsigned long long ProcImm;
    unsigned short     RaxJmp;  //jmp rax
    static HANDLE      eheapaddr; //heap address this thunk will be initialized too
    bool Init(void *pThis, DWORD_PTR proc)
    {
          RaxMov  = 0xb848;                    //movabs rax (48 B8), pThis
          RaxImm  = (unsigned long long)pThis; //
          RspMov  = 0x24448948;                //mov qword ptr [rsp+28h], rax (48 89 44 24 28)
          RspMov1 = 0x9028;                    //to properly byte align the instruction we add a nop (no operation) (90)
          Rax2Mov = 0xb848;                    //movabs rax (48 B8), proc
          ProcImm = (unsigned long long)proc;
          RaxJmp = 0xe0ff;                     //jmp rax (FF EO)
            if (FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk)))
            { //error
               return FALSE;
            }
            else
            {//succeeded
               return TRUE;
        }
    }
    //some thunks will dynamically allocate the memory for the code
    WNDPROC GetThunkAddress()
    {
        return (WNDPROC)this;
    }
    void* operator new(size_t, HANDLE heapaddr)
    {
        return HeapAlloc(heapaddr, 0, sizeof(winThunk));
    }
    void operator delete(void* pThunk, HANDLE heapaddr)
    {
        HeapFree(heapaddr, 0, pThunk);
    }
};
#pragma pack(pop)
#endif

struct AppWinStruct {
    //structure to hold variables used to instantiate the window
    LPCTSTR lpszClassName = L"Window";
    LPCTSTR lpClassName   = L"Window";
    LPCTSTR lpWindowName  = L"Window";
    DWORD     dwExStyle       = WS_EX_OVERLAPPEDWINDOW;
    DWORD    dwStyle       = WS_OVERLAPPEDWINDOW | WS_VISIBLE;
    UINT     style           = CS_HREDRAW | CS_VREDRAW;
    int     cbClsExtra       = 0;
    int     cbWndExtra       = 0;
    HICON     hIcon           = LoadIcon(nullptr, IDI_APPLICATION);
    HCURSOR hCursor       = LoadCursor(nullptr, IDC_ARROW);
    HBRUSH     hbrBackground = (HBRUSH)(COLOR_WINDOW + 1);
    LPCTSTR lpszMenuName  = nullptr;
    HICON     hIconSm       = NULL;
    int     xpos           = CW_USEDEFAULT;
    int     ypos           = CW_USEDEFAULT;
    int     nWidth           = CW_USEDEFAULT;
    int     nHeight       = CW_USEDEFAULT;
    HWND     hWndParent       = NULL;
    HMENU     hMenu           = NULL;
    LPVOID     lpParam       = NULL;
};

class AppWinClass {
public:
    
    ATOM classatom = NULL;

    AppWinClass(); //constructor
    ~AppWinClass(); //descructor

    int Create(HINSTANCE, int, AppWinStruct*);
    int GetMsg(HINSTANCE);

private:
    static HANDLE eheapaddr;
    static int objInstances;
    winThunk* thunk;
#if defined(_M_IX86)
    static LRESULT CALLBACK WndProc(DWORD_PTR, HWND, UINT, WPARAM, LPARAM);
#elif defined(_M_AMD64)
    static LRESULT CALLBACK WndProc(HWND, UINT, WPARAM, LPARAM, DWORD_PTR);
#endif
};
AppWin.cpp
// AppWin.cpp : implementation of AppWinClass
//

#include "stdafx.h"
#include "AppWin.h"

HANDLE AppWinClass::eheapaddr = NULL;
int    AppWinClass::objInstances = 0;

AppWinClass::AppWinClass()
{
    objInstances = ++objInstances;
    if (!eheapaddr)
    {        
        try {
            eheapaddr = HeapCreate(HEAP_CREATE_ENABLE_EXECUTE | HEAP_GENERATE_EXCEPTIONS, 0, 0);
        }
        catch (...) {
            throw;
        }
    }
    try {
        thunk = new(eheapaddr) winThunk;
    }
    catch (...) {
        throw;
    }
}

AppWinClass::~AppWinClass() {
    if (objInstances == 1)
    {
        HeapDestroy(eheapaddr);
        eheapaddr = NULL;
        objInstances = 0;
    }
    else
    {
        objInstances = --objInstances;
    }
}

int AppWinClass::Create(HINSTANCE hInstance, int nCmdShow, AppWinStruct* varStruct)
{
    HWND hWnd = NULL;
    DWORD showwin = NULL;
    thunk->Init(this, (DWORD_PTR)WndProc); //init our thunk
    WNDPROC pProc = thunk->GetThunkAddress(); //get our thunk's address 
                                                 //and assign it pProc (pointer to process)
    WNDCLASSEX wcex; //initialize our WNDCLASSEX
    wcex.cbSize      = sizeof(WNDCLASSEX);
    wcex.hInstance      = hInstance;
    wcex.lpfnWndProc = pProc; //our thunk
    if (!varStruct) //default values
    {
        varStruct = new AppWinStruct;
        wcex.style            = CS_HREDRAW | CS_VREDRAW;
        wcex.cbClsExtra    = 0;
        wcex.cbWndExtra    = 0;
        wcex.hIcon            = LoadIcon(nullptr, IDI_APPLICATION);
        wcex.hCursor        = LoadCursor(nullptr, IDC_ARROW);
        wcex.hbrBackground = (HBRUSH)(COLOR_WINDOW + 1);
        wcex.lpszMenuName  = nullptr;
        wcex.lpszClassName = L"Window";
        wcex.hIconSm        = NULL;
        //register wcex
        classatom = RegisterClassEx(&wcex);
        //create our window
        hWnd = CreateWindowEx(WS_EX_OVERLAPPEDWINDOW, L"Window", 
               L"Window", WS_OVERLAPPEDWINDOW | WS_VISIBLE,
               CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, 
               nullptr, nullptr, hInstance, nullptr);
        showwin = WS_VISIBLE; //we set WS_VISIBLE so we do not need to call ShowWindow
    }
    else
    { //user defined
        wcex.style            = varStruct->style;
        wcex.cbClsExtra    = varStruct->cbClsExtra;
        wcex.cbWndExtra    = varStruct->cbWndExtra;
        wcex.hIcon            = varStruct->hIcon;
        wcex.hCursor        = varStruct->hCursor;
        wcex.hbrBackground = varStruct->hbrBackground;
        wcex.lpszMenuName  = varStruct->lpszMenuName;
        wcex.lpszClassName = varStruct->lpszClassName;
        wcex.hIconSm        = varStruct->hIconSm;
        //register wcex
        classatom = RegisterClassEx(&wcex);
        //create our window
        hWnd = CreateWindowEx(varStruct->dwExStyle, varStruct->lpClassName, 
               varStruct->lpWindowName, varStruct->dwStyle,
               varStruct->xpos, varStruct->ypos, varStruct->nWidth,  
               varStruct->nHeight, varStruct->hWndParent, varStruct->hMenu,
               hInstance, varStruct->lpParam);
        showwin = (varStruct->dwStyle & (WS_VISIBLE)); //check if the WS_VISIBLE bit was set
    }
    if (!hWnd)
    {
        return FALSE;
    }
    //check if the WS_VISIBLE style bit was set and if so we don't need to call ShowWindow
    if (showwin != WS_VISIBLE)
    {
        ShowWindow(hWnd, nCmdShow);
    }
    UpdateWindow(hWnd);
    return 0;

}
#if defined(_M_IX86)
LRESULT CALLBACK AppWinClass::WndProc
(DWORD_PTR This, HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
#elif defined(_M_AMD64)
LRESULT CALLBACK AppWinClass::WndProc
(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam, DWORD_PTR This)
#endif
{
    AppWinClass* pThis = (AppWinClass*)This;
    switch (message)
    {
    case WM_PAINT:
    {
        PAINTSTRUCT ps;
        HDC hdc = BeginPaint(hWnd, &ps);
        // TODO: Add any drawing code that uses hdc here...
        EndPaint(hWnd, &ps);
    }
    break;
    case WM_DESTROY:
        PostQuitMessage(0);
        break;
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    return 0;
}

int AppWinClass::GetMsg(HINSTANCE hInstance)
{
    HACCEL hAccelTable = LoadAccelerators(hInstance, MAKEINTRESOURCE(IDC_APPWIN));

    MSG msg;

    // Main message loop:
    while (GetMessage(&msg, nullptr, 0, 0))
    {
        if (!TranslateAccelerator(msg.hwnd, hAccelTable, &msg))
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
    }

    return (int)msg.wParam;
}

History

  • Version 1.1
    • Corrected the lack of setting objInstances back to zero on the last instance of an object. Added two notes.
  • Version 1.5
    • Changed the 32 bit convention for the thunk to properly preserve the stack as pVerer suggestion in the comments - this precipitated a need to wrap the WndProc function in some #if defined conventions as the 32bit and 64bit code now call this function differently.
  • Version 1.6
    • Made some minor modifications on the thunk's delete operator
  • Version 1.8
    • Updated the x64 ABI section to more properly and clearly discuss the convention. There was a mistake in its description. Also updated the x64 thunk code and made it simpler using the movabs operand which works with 64bit immediates.
  • Version 1.8.1
    • Updated the thunk portions with a minor edit to support VS 2017 - changed using the Microsoft typedefs of USHORT, ULONG etc. to the C++ proper (i.e. USHORT became unsigned short) otherwise the code would not properly execute after being compiled by VS 2017 RC

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here