Click here to Skip to main content
15,887,676 members
Articles / Database Development / Data Visualization
Tip/Trick

Cross-platform Visual Memory Tracker

Rate me:
Please Sign up or sign in to vote.
4.95/5 (19 votes)
15 Mar 2016BSD4 min read 45K   744   47   6
Did you ever want to know where most of the memory is consumed? Whether it leaks or just gets allocated too much. This home brew memory tracker is yet another bicycle that you will be able to tune for your needs.

Introduction

Once upon a time, in a far and cold country, a group of brave engineers fought with Memory Consumption. Memory Consumption was the beast that no one could conquer…

Okay, the story was windy, and to make it shorter, let's get right to the Happy End chapter.

Happy End Chapter

The very essence of programming is to take an input and turn it into an output, right?

In our case, the input is malloc/free calls. And output is a cross-platform view.

You know that there is a way to hook CRT malloc/free functions on OSX, Linux and Windows? They are quite different, but all of them take just a few lines of code.

The bigger question is how to make a nice output with just a few lines of code?

Let's dream a little... what if there was a single cross-platform API that allows to generate different tracing formats understood by various OS dependent and independent viewers? That would be perfect!

Close your eyes, count to three... and voila! https://github.com/01org/IntelSEAPI

Clone/download the source and let's begin!

Hooking CRT

1. OSX

The best allocation hooking mechanism for my taste is at MAC OS X, all you need to keep in mind is the name of malloc_default_zone function and the fact that the structure it returns is in protected page.

C++
malloc_zone_t* pMallocZone = malloc_default_zone();
if (!pMallocZone) return false;

vm_protect(mach_task_self(), (uintptr_t)pMallocZone, sizeof(malloc_zone_t), _
		0, VM_PROT_READ|VM_PROT_WRITE);

g_origMalloc = pMallocZone->malloc;
pMallocZone->malloc = MallocHook;

g_origFree = pMallocZone->free;
pMallocZone->free = FreeHook;

g_origFreeDefSize = pMallocZone->free_definite_size;
pMallocZone->free_definite_size = FreeDefSizeHook;

vm_protect(mach_task_self(), (uintptr_t)pMallocZone, sizeof(malloc_zone_t), 0, VM_PROT_READ);

You can find the complete code in the memory.cpp file.

The main problem comes when you realize that to debug memory, you need to allocate one... But the solution is simple: we need to detect recursive call. There is nothing better than Thread Local Storage for such case. 

Old good "static __thread bool" could do the trick, but OSX implementation of  "__thread" uses malloc inside. Bad luck.

Let's call to ancient magic: pthread_key_createpthread_setspecificpthread_getspecific - these guys work directly with the thread record and do not allocate anything.

C++
void* MallocHook(struct _malloc_zone_t *zone, size_t size)
{
    if (pthread_getspecific(tls_key))
        return g_origMalloc(zone, size);

2. Linux

Linux has two ways to hook allocations - the outdated:

        based on explicit use of __malloc_initialize_hook, __malloc_hook, __free_hook variables - now it's marked as deprecated in malloc.h.

And alternative

        based on symbol resolution order - if you just define malloc and free functions in you compilation unit, they will be used.

In Both OSX and Linux cases, we call the original malloc/free from inside our hook and put __itt mark up around original calls to create the output.

3. Windows

From one hand, the Windows mechanism is simple, just register your callback with _CrtSetAllocHook and receive notifications: _HOOK_ALLOC, _HOOK_FREE.

C++
int(int allocType, void *userData, size_t size, int blockType, 
long requestNumber, const unsigned char *filename, int lineNumber);

But it's subtle. The problem is that call-back is called only before malloc and before free, not after.

So we have memory pointer in 'free' hook, but not in 'malloc'. And how to identify pairs of malloc/hook after that?

Well, here we need not only a hook but also a hack.

Let's look closer at what we get on _HOOK_ALLOC - there is a requestNumber. Can we match it with the requestNumber in _HOOK_FREE?

No! Because on _HOOK_FREE, the 0 is always passed to requestNumber what a shame!

And yes! Because userData is pointing to memory after CRT block that has the requestNumber.

Here is what we do: (((_CrtMemBlockHeader*)userData)-1)->lRequest

Since in visual representation the real value of pointer is not important, we can use the requestNumber as id to match the malloc/free pair.

Visualization

That's an easy part.

We add:

__itt_heap_allocate_begin/__itt_heap_allocate_end around the original call to malloc inside our hooks.

and __itt_heap_free_begin/__itt_heap_free_end around the original call to free.

C++
void* MallocHook(size_t size, const void * context)
{
    if (pthread_getspecific(tls_key))
        return g_origMalloc(size, context);
    CRecursionScope scope;

    __itt_heap_allocate_begin(g_heap, size, 0);
    void* res = g_origMalloc(size, context);
    __itt_heap_allocate_end(g_heap, &res, size, 0);

    return res;
}

void FreeHook(void* ptr, const void* context)
{
    if (pthread_getspecific(tls_key))
        return g_origFree(ptr, context);
    CRecursionScope scope;

    __itt_heap_free_begin(g_heap, ptr);
    g_origFree(ptr, context);
    __itt_heap_free_end(g_heap, ptr);
}

And then, we run our project following the prescription on the page.

With chrome://tracing viewer, we get this nice picture:

Image 1

For each memory block size, you can see the history of count changes. And you can easily find where the memory goes.

If by the end of the trace, a block doesn't have zero count, it leaked.

History

I will appreciate any ideas on improvements of the approach. And with your help, we will create something magnificent.

Update 1

Now the allocations of Intel(R) Single Event API itself are filtered out.

Update 2

Now you can see stacks of allocations.

Update 3

Memory operations are now attributed to functions:

Image 2

Which reads: second call to CreateThread has freed a 8 bytes block once, allocated a 16 bytes blocks twice and plus one block of 1144 bytes, which gave total of +1168 bytes during this function call.

Update 4

Visual Studio 2015 support.

License

This article, along with any associated source code and files, is licensed under The BSD License


Written By
Software Developer (Senior)
Russian Federation Russian Federation
As programmer I understand that every program takes brain to be created. The more complex is the software, the more senior developers it takes. And they say that DNA doesn't have an author? I don't buy it.

Comments and Discussions

 
Questionhow can i crosscompile it for arm linux??? Pin
Yincity29-Sep-16 21:49
Yincity29-Sep-16 21:49 
QuestionLooks Really Interesting but ... Pin
UrbanBlues9-Jan-16 5:02
UrbanBlues9-Jan-16 5:02 
AnswerRe: Looks Really Interesting but ... Pin
araud15-Jan-16 21:33
araud15-Jan-16 21:33 
GeneralRe: Looks Really Interesting but ... Pin
Nelek5-Feb-16 1:37
protectorNelek5-Feb-16 1:37 
Questionittnotify.h missing! Pin
ehaerim4-Jan-16 23:18
ehaerim4-Jan-16 23:18 
AnswerRe: ittnotify.h missing! Pin
araud5-Jan-16 0:04
araud5-Jan-16 0:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.