64 bit Visual Studio 2017 Win64 Objective C C++11 Visual Studio 2013 Win32 Server Advanced XML C Dev Windows C++

HotPatching: (VERY) Deep Inside

Michael Chourdakis

4.92/5 (68 votes)

Oct 28, 2015

CPOL

13 min read

69432

533

A ready to be used hotpatching library with five methods!

Project @ GitHub: https://github.com/WindowsNT/hotpatch

Introduction

I have created a game that is so addictive that I cannot afford to let the user quit it in order to update it.
I am already facing several lawsuits regarding this issue and, unless you vote for me now, I won't be able to pay my lawyer.

Not that I care much of course. He is not interested in any money because he will not quit playing my game. But in order to satisfy all of them, I must implement HotPatching techniques in Windows.

And I also have a customer that demands hotpatching, but does not like DLLs. Hey, is hotpatching from yourself even possible? Well, I guess yes.

Now with cleaned up library (one .h file!) and no administrator needs!

What is Hotpatching?

It's the technique that you update a process without first quitting. To accomplish that, you patch the function calls at runtime. In the most common scenario, you have an executable and some DLLs which contain functions.

Patching would occur if your executable would download another DLL from your server and load it. This DLL then would replace the old functions with the patched ones at runtime.

If you are still here, I will present even more crazy scenarios later on. Keep reading.

The Five Methods

There are five methods of hotpatching:

Loading a DLL to your address space that patches your EXE's functions. This is the safest method.
- Advantages: Safest, Easiest. Work is done in single address space.
- Disadvantages: You need an extra DLL.
Patching your EXE from an updated EXE via COM Interop and some proxy functions.
- Advantages: You don't need an additional DLL
- Disadvantages: Work is done by proxying calls, different address spaces.
Patching your EXE from an updated EXE via Shared Memory and some proxy functions.
- Advantages: You don't need an additional DLL.
- Disadvantages: Work is done by proxying calls, different address spaces.
Patching your EXE by loading an updated EXE in the same address space (!!!) as your EXE.
- Advantages: All work is done within a single EXE and within a single address space.
- Disadvantages: Hackish and often method.
Having your app as a DLL from the beginning
- Advantages: Single file app + patch
- Disadvantages: You need a bootstrap (or a .bat file or a lnk or whatever) for your user to start your app.

The Five Projects

The Visual Studio 2017 solution included contains these items:

The HotPatching library in a form of hotpatch.h, along with my update USM and XML libraries
Five projects that demonstrate hotpatching
An app that can modify a module to include hotpatching information (pig.exe + sources). The projects in the solution do not need it, because a custom build step is specified to do self-preparation. In your projects, you might need it.

The projects expose all hot patching methods in both x86 and x64.

Preparing the Compiler and the Linker

In order for a function to be eligible for hotpatching, three things must be valid:

The function must not be inline.
The size of the function must be at least two bytes.
The function parameter list must not be optimized out by the compiler (example follows) and,
There must be enough padding BEFORE the function entry to store the jump to the patch.

You can force VC++ compiler to never inline a function by using __declspec(noinline).

To accomplish the rest of the padding requirements, you need to compile with /HOTPATCH (only necessary in x86) and link with /FUNCTIONPADMIN:5 in x86 and /FUNCTIONPADMIN:12 in x64. I will explain later what these bytes are used for. In older Visual Studio editions, the linker ignores the /FUNCTIONPADMIN parameter so it does not put the required padding before the function in x64 builds.

Compilation with /FUNCTIONPADMIN puts a dummy mod edi,edi or mod rdi,rdi command in the function start. This allows you to replace these bytes with your JMP instruction.

What to Patch?

In order to patch, one must know where is the function to be patched. Some might say to export a set of patchable functions from the executable, so the patcher can search for them in the export table and find their addresses. This would work but a) you have to export all your patchable functions manually and b) what if you forget to export a function that needs patching?

My solution is to take advantage of the PDB debugging information and the DIA SDK in order to export ALL the functions that your application has, save their addresses and then include that information in the original executable as a resource. In order for this to work, you need to compile with /Zi and link with /DEBUG, even in Release mode, so the PDB is generated.

At the current VS2017, this only works in release builds. Debug builds have non standard PDBs that cannot be parsed with the DIA SDK.

Post-Build Configuration

Once the Test Executable has been built, a post-build event is called. The HOTPATCH::AutoPrepareExecutable() moves self to another file, copies self back to old file, then calls HOTPATCH::PrepareExecutable() which:

Loads the DIA SDK and the PDB file
Loads the executable in order to be able to get the functions' virtual addresses. The addresses of the functions are then saved, along with the load address of the executable, to an XML string.

HOTPATCH::PrepareExecutable also accepts a list of compilands. If you pass an empty initializer list, all compilands are included. To include, say, only main.obj, pass {L"main.obj"}.

Then, PostBuildPatch calls HOTPATCH::PatchExecutable() which simply uses BeginUpdateResource(), UpdateResource() and EndUpdateResource() to save the XML string inside the Test Executable.

If you have more than one patchable modules, then you would repeat this process for all of them.

x86 Function Patching

The patch is implemented by:

Writing to the 5 bytes before the function's entry point (where the linker has created the space with /FUNCTIONMINPAD) a relative 32-bit JMP instruction which jumps to the new function's entry point.
Replacing the first two bytes of the function's entry point with 0xF9 0xEB, this is the assembly for "JMP $ - 5" (that's why the function needs to have at least 2 bytes available), atomically.

The JMP address must be relative to the current EIP, so we subtract the new function's entry point from the old function's entry point - 5.

To find the old function's entry point, we assume that its distance from the module loading address will be always the same. When the Post-Build Patching tool saved the function RVAs, it also saved the current module loading address. Now that we might have a new module loading address, we can recalculate the function RVA with simple subtraction.

x64 Function Patching

The patch is implemented by:

Writing to the 12 bytes before the function's entry point (where the linker has created the space with /FUNCTIONMINPAD) the following:
- MOV RAX,XXXXXXXXXXXXXXXX, where XXXXXXXXXXXXXXXX is the absolute jump address of the new entry point.
- JMP RAX
Replacing the first two bytes of the function's entry point with 0xFw 0xEB, this is the assembly for "JMP $ - 12", atomically.

Microsoft says that only 6 bytes are needed for x64 hotpatching, but I don't understand how. If you use a relative JMP like x86, this can only take a 32-bit parameter and, in case that the new entry is far away than the old entry (over 31 bits since the 32nd bit is sign-extended) you can't jump. Therefore, I 've decided to patch through JMP RAX which can jump to an unconditional 64-bit absolute address.

The RAX register is considered volatile and it is not used for function parameter passing. Therefore, it's a safe bet to use it for hotpatching. If you want further safety, you can use the JMP [address] where you can store the address to jump in the memory, at the expense of more padding bytes needed.

Visual Studio Optimizations / Behaviours

If a function in your program is only called once, there is a chance that it won't be patchable. Consider this example:

// Actual prototype
void foo(int x);

If this function is only called once, say, foo(5), then the compiler could optimize the function and, instead of passing 5 to the stack or to a register, it hardcodes the value inside the function, therefore translating the actual function to this:

// Compiled prototype
void foo()
{
int x = 5;
}

This means that the patch function which you will provide won't have the same signature as the patched one, it will search for the x parameter in the stack/registers and boom.

Another thing. When a proxy is used to call a function, enough stack must be present for the callee because it seems that VS requires some extra bytes of stack to do work.

Method 1: With a DLL

Nothing to be done. Everything is done by the Patch DLL, so the executable can be patch-unaware.
You only need to load the DLL and call some exported function that patches your executable:

HINSTANCE hL = LoadLibrary(L"..\\dllpatch\\dllpatch.dll");
HRESULT(__stdcall *patch)() = (HRESULT(__stdcall *)())GetProcAddress(hL,"Patch");
if (patch)
    patch();

Using the Library from the Patch DLL

HOTPATCH hp;
hp.ApplyPatchFor(GetModuleHandle(0),L"FOO::PatchableFunction1",PatchableFunction1);

And that's all. You just specify the name of the function (the same name saved with the DIA SDK) and
the replacement, which must have the same function signature (or boom).

Method 2: With a COM Server

OK, now it's going to be a crazy bet. Is it even possible to patch yourself from your own executable?
Let's say that you have an APP.EXE that downloads a new APP.EXE from your server. But you don't have a patch DLL and you are bored to write one. Could you patch yourself with yourself?

You have to work with interprocess mechanisms, i.e., this crazy sequence of operations:

Register the downloaded app as a COM server
Query the COM server for the patching names
Patch the functions yourself

But because patching functions are not anymore in your own address space, you must patch with a "proxy" which will take the arguments from the current call, save them all, and pass it then to the COM server via Interop. But because parameters have also to be passed, this new "proxy" must be dynamic in memory (i.e., a new one for each patch) and not a static function in your code. Therefore, a some(?)-assembly-stuff has to be created dynamically (with VirtualAlloc() and VirtualProtect()), then construct a COMCALL object in that memory with placement new which is readable,writable and executable, and then this memory should save all registers (8 in x86, 16 in x64) in some unique per patch memory area, then save also the IDispatch* COM server interface, then pass it via IDispatch::Invoke() to the remote server.

Registering/Unregistering Yourself as COM Server

This is easy, you call HOTPATCH::PrepareExecutableForCOMPatching with a temporary CLSID and PROGID and then HOTPATCH::RegisterCOMServer and HOTPATCH::Unregister do the job by writing or deleting some keys under HKEY_CURRENT_USER. You don't need anymore admin because the hotpatching interface is not a new interface (therefore, it does not need registration) it is merely an IDispatch.

Registering the Patch Names

The COM Server implements an IClassFactory and an IDispatch which does the work with 2 additional member functions.

The program calls HOTPATCH::AckGetPatchNames to get the function names to be patched. This function internally calls with COM Interop the IHotPatch::GetNames of the COM Server. It returns a BSTR with space-separated function names that the COM server is ready to patch. Now that the library is implemented with plain IDispatch, there are no direct calls, but rather indirect calls to IDispatch::Invoke().

Patching a Function by Installing a Proxy

When HOTPATCH::ApplyCOMPatchFor is called, it places a patch to the function, replacing it, not with the target function (which resides in another address space now), but with a dynamically created buffer which is also executable:

#pragma pack(push,1)

struct COMCALL
    {
    unsigned char jmp1 = 0xE9;
    unsigned char jmp2 = 0x27;
    unsigned char jmp3 = 0x01;
    unsigned char jmp4 = 0x00;
    unsigned char jmp5 = 0x00; // JMP $ + 300

    void* HPPointer = 0;
    void* HPClass = 0;
#ifdef _WIN64
    char data[179];
#else
    char data[187];
#endif
    char name[100];

    // Push pop to stack
#ifdef _WIN64
    struct MOVER
        {
        unsigned char pushreg = 0;
        unsigned char poprax = 0x58;
        unsigned short movrax = 0xA348;
        unsigned long long addr = 0;
        };
#else
    struct MOVER
        {
        unsigned char pushreg = 0x66;
        unsigned char popeax = 0x58;
        unsigned char moveax = 0xA3;
        unsigned long addr = 0;
        };
#endif

    MOVER m1[8]; // base registers

#ifdef _WIN64
    struct MOVER2
        {
        unsigned char pushregp = 0x41;
        unsigned char pushreg = 0;
        unsigned char poprax = 0x58;
        unsigned short movrax = 0xA348;
        unsigned long long addr = 0;
        };

    MOVER2 m2[8]; // r8-r15 registers
#endif

    // Call the COMPatchGeneric
#ifdef _WIN64

#ifndef XCXCALL
    unsigned short movd1 = 0xB848;
    unsigned long long regaddr = 0;
    unsigned short movd2x = 0xA348;
    unsigned long long movd2 = 0;
#else
    unsigned char pushrbp = 0x55;
    unsigned char movrbprsp_1 = 0x48;
    unsigned char movrbprsp_2 = 0x89;
    unsigned char movrbprsp_3 = 0xe5;

    unsigned short movrcx = 0xB948;
    unsigned long long regaddr = 0;
    unsigned short subrsp100_1 = 0x8148;
    unsigned short subrsp100_2 = 0x00EC;
    unsigned short subrsp100_3 = 0x0001;
    unsigned char subrsp100_4 = 0;
#endif // XCXCALL
    unsigned short movrax = 0xB848;
    unsigned long long calladdr = 0;
    unsigned short callrax = 0xD0FF;
#ifdef XCXCALL
    // Restore the stack
    unsigned short addrsp100_1 = 0x8148;
    unsigned short addrsp100_2 = 0x00C4;
    unsigned short addrsp100_3 = 0x0001;
    unsigned char addrsp100_4 = 0;
#endif
    unsigned char poprbp = 0x5d;
    unsigned char ret = 0xC3;
#else
    // Call the COMPatchGeneric
#ifndef XCXCALL
    unsigned short movd1 = 0x05C7;
    unsigned long movd2 = 0;
    unsigned long regaddr = 0;
#else
    unsigned char pushebp = 0x55;
    unsigned char movebpesp_1 = 0x89;
    unsigned char movebpesp_2 = 0xe5;

    unsigned char movecx = 0xB9;
    unsigned long regaddr = 0;
    // Give the callee some stack to work with
    unsigned short subesp100_1 = 0xEC81;
    unsigned short subesp100_2 = 0x0100;
    unsigned short subesp100_3 = 0x0000;
#endif

    unsigned char moveax = 0xB8;
    unsigned long calladdr = 0;
    unsigned short calleax = 0xD0FF;
#ifdef XCXCALL
    // Restore the stack
    unsigned short addesp100_1 = 0xC481;
    unsigned short addesp100_2 = 0x0100;
    unsigned short addesp100_3 = 0x0000;
#endif
    unsigned char popebp = 0x5d;
    unsigned char ret = 0xC3;
#endif // WIN64

    COMCALL(IHotPatch*dispp,HOTPATCH* hpx,size_t targetcall,const wchar_t* fname)
        {
        HPPointer = dispp;
        HPClass = hpx;
        calladdr = targetcall;
        regaddr = (size_t)this;
#ifndef XCXCALL
        movd2 = (size_t)&COMCALPTR;
#endif
        for (int i = 0; i < 8; i++)
            {
            m1[i].pushreg = (unsigned char)(0x50 + i);
            m1[i].addr = (size_t)(data + i * sizeof(size_t));
            }
#ifdef _WIN64
        for (int i = 0; i < 8; i++)
            {
            m2[i].pushreg = (unsigned char)(0x50 + i);
            m2[i].addr = (unsigned long long)(data + (i + 8) * 8);
            }
#endif
        size_t le = wcslen(fname);
        if (le > 50)
            le = 50;
        memcpy(name,fname,le * 2);
        }
    };

#pragma pack(pop)

The first 5 bytes are a JMP $ + 300, which leaves some space to save data. In the data[] and name[] the current values of the registers are set, by executing the "MOVER" structure 8 times (i.e., PUSH EXX, POP EAX,MOV [address],EAX), where EXX and address are set by the constructor of the COMCALL32 structure at runtime.

Finally, there is a sequence in which the address of the IHotPatch* pointer is saved to ECX, and then MOV EAX, COMPatchGeneric, CALL EAX, RET are encoded.

Some stack is added and removed before the call to COMPatchGeneric / USMPatchGeneric.

COMPatchGeneric is another, this time C++ function, which reads the structure address from ECX/RCX, then reads all the required data from the COMCALL32 structure, creates the appropriate BSTR and SAFEARRAY COM variables in order to pass them to the COM server and, finally, passes them through IHotPatch::Call.

For x64 it is similar, except that 16 registers have to be saved (r9-r15 as well). The proxy bytes in the COMCALL structure change their size as well. Adding push rbp/pop rbp resolved the old x64 crash.

Method 3: With Shared Memory

You thought I'm done? Hahahah. Not so fast. I plan to use my USM to do the same interprocess mechanism but without COM. Less registry and, most important, less buggy COM proxying. Very much direct stuff.

The mechanism for using shared memory is similar to the COM mode. You create a shared memory area which is initially used to request the patch names. Then you run the patcher manually and make it (say, through a command line option) to also open the same shared memory area and return you the names. This is implemented in HOTPATCH::StartUSMServer.

Instead of passing stuff to COM, the patcher uses the mutual class (found in USM) to "call" the remote function. This method has the benefit that it does not need registration, but it needs extra work in sending/receiving data from/to the patcher.

Method 4: Self-EXE as DLL

Of course, as you know, you cannot LoadLibrary an EXE file, for even if it is relocatable, the CRT is not the same. Even if you load the EXE and get a pointer with GetProcAddress, calling the function is likely to get you a crash since no proper initialization has occurred.

Actually, you CAN LoadLibrary an EXE and you CAN prepare it for execution in the very same address space of YOUR EXE. The mechanisms for that method will be explained in my article about using an EXE as a DLL.

However, you are strongly advised not to use this method as it is very hacky and unreliable.

Method 5: Self-DLL

This method requires that your final app is a DLL itself. So problems from method 4 are no longer existing, but you have to have a bootstrapper to start your app (say, by calling rundll32).

ToDo

I have to also find a way to exclude stuff from the DIA SDK. I've put some string searches like std::, ATL::, etc. to exclude Microsoft stuff. If anyone knows a solution to include ONLY the functions found in YOUR source code, let me know.

A Few Last Words...

As I said, my clients are now using this library so they never quit the game they love that much. However, my girlfriend is also stuck at the PC's screen and because the game never quits since patching does not require a restart, she won't cook for me anymore.

Therefore, in case you or your company find this library useful, do tell your boss about me. He might hire me and then I could afford to buy some food. :)

GOOD LUCK!

History

23-10-2018: Code cleanup and added a 5 method. VS 2017 updates
28-08-2016: Enhanced library, no more admin for COM, cleaned interface
01-11-2015: Added another method by patching an EXE using the EXE as DLL - Article on it follows