Introduction
Memory access monitor is implemented as DLL that is injected into the target process. I extended command line interface of tool described in my previous article, https://www.codeproject.com/Articles/1266083/x64-API-Hooker-plus-Disassembler to inject our DLL and eject it. I will include the existing source (with some bug fixes; I wonder how it worked now...) with source of monitor DLL. The DLL itself is also 64-bit, however it can become 32-bit with some minor modifications.
Using the Code
We will use vectored exception handler to catch our read/write access violations. We can add process-wide exception handler with AddVectoredExceptionHandler
function:
PVOID WINAPI AddVectoredExceptionHandler(
_In_ ULONG FirstHandler,
_In_ PVECTORED_EXCEPTION_HANDLER VectoredHandler
);
The first parameter determines the order in which multiple exception handlers get called. If the process we are going to monitor has already registered its own exception handler, it might be important to assure that we set this parameter to TRUE
, so we can catch our read/write exceptions and handle them without passing them to this handler, which might become irritated and call TerminateProcess
without a word, etc.
Vectored exception handler is process-wide, and it applies to all threads in the process, so we need to synchronize execution between multiple threads, so our monitor won't break. MSDN says it's not recommended to use synchronization objects or allocate memory within the handler, see Remarks here, so I decided to implement a simple spin lock from Wikipedia (you will see the code later).
Memory region to be monitored is represented by the following struct
:
struct MONITOR_ENTRY
{
UCHAR *Start; DWORD Size; FILE *File; int Counter; };
When we start monitoring, we change protection to PAGE_EXECUTE
only, so if given region contains code, it will be allowed to execute. We register our exception handler that will be called when process will try to read or write to this memory region. Exception handler has the following prototype:
LONG NTAPI Handler(EXCEPTION_POINTERS *ExceptionInfo);
And EXCEPTION_POINTERS
structure:
typedef struct _EXCEPTION_POINTERS {
PEXCEPTION_RECORD ExceptionRecord;
PCONTEXT ContextRecord;
} EXCEPTION_POINTERS, *PEXCEPTION_POINTERS;
ContextRecord
holds thread context at the moment when exception occurred, and ExceptionRecord
holds information about exception. We can modify thread context structure (e.g., Rax
register value), so when we return from handler Windows will update context before it continues thread execution. To signal that exception is handled and continue execution, we return EXCEPTION_CONTINUE_EXECUTION
from the handler, however when we are not interested in exception, we should return EXCEPTION_CONTINUE_SEARCH
(e.g., for exceptions that should be handled by our process).
When read / write attempt will occur, we will catch EXCEPTION_ACCESS_VIOLATION
(exception code is stored in ExceptionInfo->ExceptionRecord->ExceptionCode
) exception. To handle it, we will need:
- Address of instruction that caused exception
- Address of inaccessible data
- Access type (read / write)
The first parameter is retrieved from thread context structure (ExceptionInfo->ContextRecord->Rip
), the second parameter is stored in ExceptionInfo->ExceptionRecord->ExceptionInformation[1]
, and the access type is stored inside ExceptionInfo->ExceptionRecord->ExceptionInformation[0]
. Refer to this link for more details. Actions we will perform are listed below:
- Acquire lock
- Suspend all other threads (because we can't change protection on the fly, in case some thread executes code inside our region)
- Change protection of region to
PAGE_READWRITE
, so we can read the bytes of instruction that caused access violation - Copy this instruction to some buffer (in case rip relative addressing is used, we will need to modify it a little, preserving its side effects)
- Add invalid instruction opcode (
UD2
) instruction after the one we have just copied - Modify instruction pointer so it will point to our buffer
- Continue execution (without releasing the lock)
Thread will continue its execution inside our buffer, will execute our copied instruction, and after that will attempt to execute UD2
instruction. This will trigger yet another exception EXCEPTION_ILLEGAL_INSTRUCTION
. Now our actions are:
- Change protection of region back to
PAGE_EXECUTE
- Modify instruction pointer so it will point to instruction that immediately follows the original instruction that caused access violation
- Resume all other threads
- Release lock
- Continue execution
We need to make one clarification: transfer control instructions like jmp qword ptr [rax]
can be executed without read permission, though they implicitly reference memory.
Now let's see the actual code of our DLL monitor. We have DllMain
to catch target process thread creation and termination:
extern BOOL g_Update;
BOOL APIENTRY DllMain( HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
break;
case DLL_THREAD_ATTACH:
g_Update = TRUE;
break;
case DLL_THREAD_DETACH:
g_Update = TRUE;
break;
case DLL_PROCESS_DETACH:
break;
default:
break;
}
return TRUE;
}
Auxiliary functions to handle instruction bytes:
UCHAR g_InvalidOpcode[] = { 0x0F, 0x0B };
UCHAR g_RegisterRestore[] = { 0x48, 0x58 };
UCHAR g_RegisterOverride[] = { 0x48, 0x50,
0x68, 0x00, 0x00, 0x00, 0x00,
0xc7, 0x44, 0x24, 0x04, 0x00, 0x00, 0x00, 0x00,
0x48, 0x58 };
#define REGISTER_OVERRIDE_SIZE sizeof(g_RegisterOverride)
#define REGISTER_RESTORE_SIZE sizeof(g_RegisterRestore)
#define INVALID_OPCODE_SIZE sizeof(g_InvalidOpcode)
void GenerateInvalidOpcode(UCHAR *Bytes)
{
memcpy(Bytes, g_InvalidOpcode, INVALID_OPCODE_SIZE);
}
void GenerateRegisterOverride(DWORD Register, DWORD64 Value, UCHAR *OverBytes)
{
memcpy(OverBytes, g_RegisterOverride, REGISTER_OVERRIDE_SIZE);
OverBytes[1] += Register;
*((INT32*)(OverBytes + 3)) = Value;
*((INT32*)(OverBytes + 11)) = Value >> 32;
OverBytes[16] += Register;
}
void GenerateRegisterRestore(DWORD Register, UCHAR *RestBytes)
{
memcpy(RestBytes, g_RegisterRestore, REGISTER_RESTORE_SIZE);
RestBytes[1] += Register;
}
void GenerateTrampoline(UCHAR *Ptr, UCHAR *Bytes, DWORD Size,
bool rip, int index, DWORD *pTrampSize)
{
DWORD64 Address;
DWORD TrampSize;
INT32 Offset;
UCHAR Rex, Lock, Prefix, Prefix0F;
UCHAR Opcode;
UCHAR Modrm;
DWORD AddrReg;
DWORD Reg;
DWORD i, j, pi;
i = 0;
j = 0;
if (rip)
{
if (Bytes[i] == 0xF0)
{
Lock = Bytes[i];
++i;
}
else Lock = 0;
if ((Bytes[i] == 0x66) || (Bytes[i] == 0xF2) || (Bytes[i] == 0xF3))
{
Prefix = Bytes[i];
++i;
}
else Prefix = 0;
if ((Bytes[i] >= 0x40) && (Bytes[i] <= 0x4F))
{
Rex = Bytes[i];
++i;
}
else Rex = 0;
if (Bytes[i] == 0x0F)
{
Prefix0F = Bytes[i];
++i;
}
else Prefix0F = 0;
Opcode = Bytes[i];
++i;
Modrm = Bytes[i];
++i;
Offset = *((INT32*)&Bytes[i]);
i += sizeof(Offset);
pi = Size - i;
i += pi;
TrampSize = REGISTER_OVERRIDE_SIZE + (i - sizeof(Offset)) + REGISTER_RESTORE_SIZE;
if ((Ptr + TrampSize + INVALID_OPCODE_SIZE) > (Ptr + BUFFER_SIZE))
{
fprintf(g_Entry[index].File, "buffer overflow\n");
TerminateProcess(GetCurrentProcess(), 0);
}
Address = (DWORD64)(Bytes + Size + Offset);
Reg = (Modrm & 0x38) >> 3;
AddrReg = ((Reg == 7) || (Reg == 3) || (Reg == 4)) ? (Reg - 1) : (Reg + 1);
GenerateRegisterOverride(AddrReg, Address, &Ptr[j]);
j += REGISTER_OVERRIDE_SIZE;
if (Lock)
{
Ptr[j] = Lock;
++j;
}
if (Prefix)
{
Ptr[j] = Prefix;
++j;
}
if (Rex)
{
Ptr[j] = Rex;
++j;
}
if (Prefix0F)
{
Ptr[j] = Prefix0F;
++j;
}
Ptr[j] = Opcode;
++j;
Ptr[j] = AddrReg | (Reg << 3);
++j;
memcpy(&Ptr[j], &Bytes[i - pi], pi);
j += pi;
GenerateRegisterRestore(AddrReg, &Ptr[j]);
j += REGISTER_RESTORE_SIZE;
}
else
{
TrampSize = Size;
if ((Ptr + TrampSize + INVALID_OPCODE_SIZE) > (Ptr + BUFFER_SIZE))
{
fprintf(g_Entry[index].File, "buffer overflow\n");
TerminateProcess(GetCurrentProcess(), 0);
}
memcpy(&Ptr[j], &Bytes[i], Size);
j += Size;
}
GenerateInvalidOpcode(&Ptr[j]);
*pTrampSize = TrampSize;
}
Auxiliary functions to update, suspend and resume threads:
DWORD g_ThreadId[100];
DWORD g_ThreadIdCount;
HANDLE g_ThreadHandle[100];
DWORD g_ThreadHandleCount;
void UpdateThreads()
{
HANDLE hThreadSnap;
THREADENTRY32 te32;
hThreadSnap = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
te32.dwSize = sizeof(THREADENTRY32);
Thread32First(hThreadSnap, &te32);
g_ThreadIdCount = 0;
do
{
if ((te32.th32OwnerProcessID == GetCurrentProcessId()) &&
(te32.th32ThreadID != GetCurrentThreadId()))
{
if (g_ThreadIdCount == ARRAYSIZE(g_ThreadId))
{
fprintf(g_File, "Array for thread ids is too small\n");
TerminateProcess(GetCurrentProcess(), 0);
}
g_ThreadId[g_ThreadIdCount] = te32.th32ThreadID;
++g_ThreadIdCount;
}
} while (Thread32Next(hThreadSnap, &te32));
CloseHandle(hThreadSnap);
fprintf(g_File, "thread count updated: %d\n\n", g_ThreadIdCount);
fflush(g_File);
}
void SuspendThreads()
{
g_ThreadHandleCount = 0;
for (int i = 0; i < g_ThreadIdCount; ++i)
{
if (g_ThreadId[i] != GetCurrentThreadId())
{
g_ThreadHandle[g_ThreadHandleCount] =
OpenThread(THREAD_ALL_ACCESS, FALSE, g_ThreadId[i]);
SuspendThread(g_ThreadHandle[g_ThreadHandleCount]);
++g_ThreadHandleCount;
}
}
if (g_ThreadHandleCount) Sleep(THREAD_DELAY); }
void ResumeThreads()
{
for (int i = 0; i < g_ThreadHandleCount; ++i)
{
ResumeThread(g_ThreadHandle[i]);
CloseHandle(g_ThreadHandle[i]);
}
if (g_ThreadHandleCount) Sleep(THREAD_DELAY); }
Spinlock is implemented in ASM
:
PUBLIC spin_lock
PUBLIC spin_unlock
.data
locked dd 0
.code
spin_lock PROC
mov eax, 1
xchg eax, [locked]
test eax, eax
jnz spin_lock
ret
spin_lock ENDP
spin_unlock PROC
xor eax, eax
xchg eax, [locked]
ret
spin_unlock ENDP
END
And called from C
:
extern "C"
{
void spin_lock();
void spin_unlock();
}
Global variables to hold information about memory ranges, addresses, etc.
MONITOR_ENTRY g_Entry[100];
DWORD g_EntryCount;
FILE *g_File;
DWORD g_index;
PVOID g_Handler;
UCHAR *g_NextInstructionAddress;
UCHAR *g_InvalidOpcodeAddress;
UCHAR *g_DataAddress;
UCHAR *g_Buffer;
DWORD g_Access;
DWORD g_TicksBegin;
BOOL g_Stopped;
BOOL g_Update;
Exported function to start monitor. Memory ranges are constructed from array of string
s that hold module names:
__declspec(dllexport) void StartMonitor()
{
DWORD OldProtect;
IMAGE_NT_HEADERS64 *Headers;
read_spec(L"data.bin");
char* Modules[] = { "{this}" };
char Buffer[MAX_PATH];
char *ModuleName;
int i;
for (i = 0; (i < ARRAYSIZE(Modules)) && (i < ARRAYSIZE(g_Entry)); ++i)
{
if (!strcmp(Modules[i], "{this}")) ModuleName = NULL;
else ModuleName = Modules[i];
g_Entry[i].Start = (UCHAR*)GetModuleHandleA(ModuleName);
if (!ModuleName)
{
GetModuleFileNameA((HMODULE)g_Entry[i].Start, Buffer, sizeof(Buffer));
ModuleName = Buffer + strlen(Buffer) - 1;
while (*ModuleName != '\\') --ModuleName;
++ModuleName;
}
else
{
strcpy(Buffer, ModuleName);
ModuleName = Buffer;
}
strcat(ModuleName, ".txt");
g_Entry[i].File = fopen(ModuleName, "w");
if (!g_Entry[i].File) TerminateProcess(GetCurrentProcess(), 0);
Headers = (IMAGE_NT_HEADERS64*)((UCHAR*)g_Entry[i].Start +
((IMAGE_DOS_HEADER*)g_Entry[i].Start)->e_lfanew);
g_Entry[i].Size = Headers->OptionalHeader.SizeOfImage;
if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size, PAGE_EXECUTE, &OldProtect))
{
fprintf(g_Entry[i].File, "VirtualProtect\n");
TerminateProcess(GetCurrentProcess(), 0);
}
g_Entry[i].Counter = 0;
}
g_EntryCount = i;
g_Stopped = FALSE;
g_File = fopen("default.txt", "w");
if (!g_File) TerminateProcess(GetCurrentProcess(), 0);
fprintf(g_File, "StartMonitor : %d\n\n", GetCurrentThreadId());
fflush(g_File);
g_Buffer = (UCHAR*)VirtualAlloc(NULL, BUFFER_SIZE, MEM_RESERVE |
MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (!g_Buffer)
{
fprintf(g_File, "VirtualAlloc\n");
TerminateProcess(GetCurrentProcess(), 0);
}
g_TicksBegin = GetTickCount();
g_Handler = AddVectoredExceptionHandler(TRUE, Handler);
if (!g_Handler)
{
fprintf(g_File, "AddVectoredExceptionHandler\n");
TerminateProcess(GetCurrentProcess(), 0);
}
}
Exported function to stop monitor. Actions that we perform:
- Acquire lock
- Suspend all other threads (because we can't change protection on the fly, in case some thread executes code inside our region)
- Change protection to
PAGE_EXECUTE_READWRITE
- Resume all other threads
- Release lock
- Remove our exception handler and cleanup
__declspec(dllexport) void StopMonitor()
{
spin_lock();
UpdateThreads();
SuspendThreads();
DWORD OldProtect;
for (int i = 0; i < g_EntryCount; ++i)
{
if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size,
PAGE_EXECUTE_READWRITE, &OldProtect))
{
fprintf(g_File, "VirtualProtect\n");
TerminateProcess(GetCurrentProcess(), 0);
}
}
g_Stopped = TRUE;
ResumeThreads();
spin_unlock();
RemoveVectoredExceptionHandler(g_Handler);
Sleep(THREAD_DELAY * 5); for (int i = 0; i < g_EntryCount; ++i)
{
fclose(g_Entry[i].File);
}
free_spec();
VirtualFree(g_Buffer, 0, MEM_RELEASE);
fprintf(g_File, "StopMonitor : %d, %d\n\n", GetCurrentThreadId(),
GetTickCount() - g_TicksBegin);
fclose(g_File);
}
And the handler itself. Note that fprintf
functions can be replaced by functions that write to some buffer that get flushed to file on disk when it is full. Also, we handle MSVC_EXCEPTION
just for fun, it serves no purpose in our memory monitor.
LONG NTAPI Handler(EXCEPTION_POINTERS *ExceptionInfo)
{
Buffer code_buf;
Instruction inst;
UCHAR *InstAddress, *DataAddress;
DWORD InstSize, TrampSize, ExcCode, OldProtect, i, Access;
ExcCode = ExceptionInfo->ExceptionRecord->ExceptionCode;
if (ExcCode == EXCEPTION_ACCESS_VIOLATION)
{
InstAddress = (UCHAR*)ExceptionInfo->ContextRecord->Rip;
Access = ExceptionInfo->ExceptionRecord->ExceptionInformation[0];
DataAddress = (UCHAR*)ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
for (i = 0; i < g_EntryCount; ++i)
{
if ((DataAddress >= (UCHAR*)g_Entry[i].Start) &&
(DataAddress < ((UCHAR*)g_Entry[i].Start + g_Entry[i].Size)))
{
spin_lock();
if (g_Stopped)
{
spin_unlock();
return EXCEPTION_CONTINUE_EXECUTION;
}
if (Access == 0) fprintf(g_Entry[i].File, "Access: READ\n");
else if (Access == 1) fprintf(g_Entry[i].File, "Access: WRITE\n");
else
{
fprintf(g_Entry[i].File, "Access: EXECUTE\n");
TerminateProcess(GetCurrentProcess(), 0);
}
fprintf(g_Entry[i].File, "Counter: %d\n", g_Entry[i].Counter);
++(g_Entry[i].Counter);
fprintf(g_Entry[i].File, "Thread Id: %d\n", GetCurrentThreadId());
fprintf(g_Entry[i].File, "Instruction Address: %p\n", InstAddress);
fprintf(g_Entry[i].File, "Data Address: %p\n", DataAddress);
if (g_Update)
{
UpdateThreads();
g_Update = FALSE;
}
SuspendThreads();
if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size,
PAGE_READWRITE, &OldProtect))
{
fprintf(g_Entry[i].File, "VirtualProtect\n");
TerminateProcess(GetCurrentProcess(), 0);
}
if (Access == 1) fprintf(g_Entry[i].File, "Data Before: ");
else fprintf(g_Entry[i].File, "Data: ");
for (int j = 0; j < VAR_SIZE; ++j)
{
fprintf(g_Entry[i].File, "%02hhX ", DataAddress[j]);
}
fprintf(g_Entry[i].File, "\n");
fflush(g_Entry[i].File);
c_MakeBuffer(InstAddress, 100, (Encoding)0, &code_buf);
inst_set_params(&inst, MODE_64, C_TRUE, &code_buf, NULL,
SHOW_ADDRESS | SHOW_LOWER | SHOW_PSEUDO);
if (!decode(&inst))
{
fprintf(g_Entry[i].File, "decode\n");
TerminateProcess(GetCurrentProcess(), 0);
}
InstSize = code_buf.i;
GenerateTrampoline(g_Buffer, InstAddress, InstSize, inst.rip, i, &TrampSize);
GenerateInvalidOpcode(g_Buffer + TrampSize);
ExceptionInfo->ContextRecord->Rip = (DWORD64)g_Buffer;
g_NextInstructionAddress = InstAddress + InstSize;
g_InvalidOpcodeAddress = g_Buffer + TrampSize;
g_DataAddress = DataAddress;
g_Access = Access;
g_index = i;
return EXCEPTION_CONTINUE_EXECUTION;
}
}
}
else if (ExcCode == EXCEPTION_ILLEGAL_INSTRUCTION)
{
if (ExceptionInfo->ContextRecord->Rip == (DWORD64)g_InvalidOpcodeAddress)
{
i = g_index;
DataAddress = g_DataAddress;
Access = g_Access;
if (Access == 1)
{
fprintf(g_Entry[i].File, "Data After: ");
for (int j = 0; j < VAR_SIZE; ++j)
{
fprintf(g_Entry[i].File, "%02hhX ", DataAddress[j]);
}
fprintf(g_Entry[i].File, "\n");
}
fprintf(g_Entry[i].File, "\n");
fflush(g_Entry[i].File);
if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size, PAGE_EXECUTE, &OldProtect))
{
fprintf(g_Entry[i].File, "VirtualProtect\n");
TerminateProcess(GetCurrentProcess(), 0);
}
ExceptionInfo->ContextRecord->Rip = (DWORD64)g_NextInstructionAddress;
ResumeThreads();
spin_unlock();
return EXCEPTION_CONTINUE_EXECUTION;
}
}
else if (ExcCode == MSVC_EXCEPTION)
{
THREADNAME_INFO *info =
(THREADNAME_INFO*)ExceptionInfo->ExceptionRecord->ExceptionInformation;
fprintf(g_File, "Thread Exception: %x %d %p\n",
ExcCode, GetCurrentThreadId(), ExceptionInfo->ContextRecord->Rip);
if (info->szName) fprintf(g_File, "Name: %s\n", info->szName);
fprintf(g_File, "Id: %d\n\n", info->dwThreadID);
fflush(g_File);
return EXCEPTION_CONTINUE_SEARCH;
}
fprintf(g_File, "Skip Exception: %x %d %p\n\n", ExcCode,
GetCurrentThreadId(), ExceptionInfo->ContextRecord->Rip);
fflush(g_File);
return EXCEPTION_CONTINUE_SEARCH;
}
As you can see, we have one default file and each file for each memory region. Default file contents might look like:
And file contents for some memory range might look like:
To start monitor, we will use the following commands passed to our tool:
inject Monitor.dll
add kernel32.dll export AddVectoredExceptionHandler
addh Handlers.dll : AddVectoredExceptionHandlerHandler to kernel32.dll :
AddVectoredExceptionHandler
wait async
To stop monitor, we will use:
eject-stop
remove kernel32.dll : AddVectoredExceptionHandler
Basically, that's it! Thank you for reading.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.