Click here to Skip to main content
15,867,568 members
Articles / Containers

A Monitored, Memory Mapped std::allocator for Mass Data Storage in STL Container

Rate me:
Please Sign up or sign in to vote.
5.00/5 (10 votes)
26 May 2020CPOL7 min read 22.8K   249   17   14
A novel allocator implementation for managing huge sets of data in STL's std:: containers for Windows operating systems
On Windows systems, managing huge sets of data in the process address space is not just limited by your computer system's installed Working Set Size and Swapping Pagefile. To keep the System stable, allocation frequency and quantity of allocations also need to be considered. In the present article, we introduce an observed std::allocator for all varieties of STL containers that allow to manage huge amounts of data while taking care of system requirements.

Introduction

When it comes to handling large volumes of data in your application's address space, the installed swap file size gives a natural limitation. Reaching this limit, the operating system will start to unconditionally terminate running processes and finally your working process. Moreover, it shows that if memory is acquired too hastily, the system gets into trouble pushing the data away from the Working-Set onto the pagefile. This leads to the effect, that the system itself starts to trash, becomes sluggish and finally stops responding to inputs at all.

These effects have been verified on independent Windows installation of Windows 8 and Windows 10.

One should be able to reproduce the described behaviour easily, by the following tiny program. Please take care before doing so, since this will stop your system from working and you might need to reboot.

C++
//
// Sample illustrating the observed effect
//

int main()
{
    for (size_t iter = 0; iter < 1000000000; ++iter)
    {
        void* ptr = malloc(1024);
    }
}

Using ProcessExplorer and running this code, one can observe the speedy increase of allocated process memory until the maximum working set level is reached. At this point, the system stops operating. The excessive number of allocations seem to overload Windows virtual memory management and the system does reject further user input.

Studying the literature either the SetProcessWorkingSetSize API call or Windows Job objects are recommended.
While the first seems to have no effect on really limiting the process working set, the latter suffers from the requirement for additional privileges a user needs to have granted - which makes it difficult for general installations to be used.

To overcome the issue also direct memory mapping was verified. In that case, ProcessExplorer shows that the working set level of the executing process stays low. Nevertheless, the system wide consumed working space level grows constantly so that we finally end up in the same scenario.

In the following, we discuss how to overcome memory limitations and control the allocation frequency.

Controlling the Allocation Frequency

To control the rate of allocations taking place by a process, it is necessary to establish some monitoring. Since calls to malloc are direct API calls, it is necessary to replace them by versions that can be surveyed. On the other hand, the replacing function should affect the executing process' performance as little as possible.

In our approach, we achieve this by introducing a second thread - called "Observer Thread". Its objective is to measure and control the rate of allocations taking place within a given period of time. This thread then slows down the main thread if required.

To achieve access to the malloc function, we simply replace it by a global function pointer variable of the form:

C++
// Replacing default malloc by a controlled version

#define malloc(a) (*mallocfct)(a);

Notice that mallocfct is a global non-const variable that can be modified during runtime.

In our approach, we utilize three different implementations of the mallocfct. Which one is used depends on the process' current state of consumed memory.

C++
//
// Different specifications of the mallocfct
//

DWORD sleepCount=100;
void* sloppymalloc(size_t size_p)
{
    Sleep(sleepCount);
    return malloc(size_p);
}

bool stopMallocs = false;
void* stopmalloc(size_t size_p)
{
    while (stopMallocs)
        Sleep(0);
    return malloc(size_p);
}

void* speedymalloc(size_t size_p)
{
    return malloc(size_p);
}

To measure the consumed memory of the process, the observer thread periodically issues calls to GlobalMemoryStatusEx and the hereby provided status.ullAvailPhys attribute.

In general, the allocation function is set to speedymalloc which resembles regular malloc without any time delay. As soon as the workspace limit is reached, the sloppymalloc method is injected. During periods of flushing, the stopmalloc function is active, which will prevent the main thread from further allocations taking place.

By this extension, we were able to remove the limitation of the process to the workspace limit and to fully utilize the pagefile space. It is important to understand that this handling is not affecting the implementation of the main process and also allows multiple threads to be controlled at a time. On the downside, an additional core is consumed for operating the observer code.

Relying on a sufficient pagefile size still has limitations with an eye on available space. In addition, the pagefile space is not exclusively available for the worker process and rivals with other processes running on the system at same time of execution. Running out of swap space, the system starts to terminate randomly running processes and finally ends the user process. Increasing the pagefile size is not possible during uptime and comes with the need to reboot the system.

The following section will therefore discuss how to eliminate the system swapfile limitation.

Dynamically Extending the Process Address Space

To extend the available virtual address space of a process, memory mapping of files is shown to be the technique of choice. It allows to dynamically create and add swap space during runtime to process and eliminates the need for system reconfiguration. Each memory mapped file gives a new section of virtual address space that can be accessed directly. Please be aware that memory mapping files is most efficient for x64 applications which provide a nearly unlimited virtual memory address space.

We therefore implemented a simple memory manager, that hooks into the fore-mentioned mallocfct routine and allows to span multiple memory mapped regions. New swap files are created and added to the process virtual memory space as needed. The implementation is kept very simple and uses a first fit to reallocate memory space.

C++
//
// mallocfct extended to call the file mapped memory manager
//

void* sloppymalloc(size_t size_p)
{
    Sleep(sleepCount);
    return Heap_g.allocateNextMemBlock(size_p);
}

bool stopMallocs = false;
void* stopmalloc(size_t size_p)
{
    while (stopMallocs)
        Sleep(0);
    return Heap_g.allocateNextMemBlock(size_p);
}

void* speedymalloc(size_t size_p)
{
    return Heap_g.allocateNextMemBlock(size_p);
}

void movetofree(void* p)
{
    char* p_ = (char*)p;
    Heap_g.freeMemBlock( (__s_block*)(p_ - BLOCK_SIZE) );
}

This approach theoretically removes any memory limitation, but showed in practice that consuming the working space still results into system instabilities. Monitoring the processes, it was observed that the workspace level of the executing process stays low while the overall system memory gets consumed. Again, as soon as the maximum workspace level is reached, the system starts to slow down and finally stops responding.

The solution to this issue was, that the observer thread explicitly inquires system flushing by calling the Windows API function SetProcessWorkingSetSize(HANDLE, -1, -1) at the time the allowed working space limit is reached. In our tests this was set to 1 Gbyte below the available physical memory.

It was also observed, that different Versions of Windows behave differently. While under Windows XP/7 and Wine calls to VirtualUnlock allow to release the mappings from workspace, Windows 8 and 10 requires full EmptyWorkingSet to release mapped pages.

 

Image 1

Figure 1: Consumed system workspace over time

In the final step, the outlined techniques were integrated to be used conveniently in a std::allocator model template.

Integration in the std Template Library Model

Providing the observed malloc and memory mapped file management in a std::allocator gives the possibility to limit the outlined concept to the scope of specific, memory consuming instances of standard containers. Other data structures and containers will not be affected.

To make the allocator model use the memory mapped heap, the routines of allocate and deallocate have been implemented as follows:

C++
// Allocate memory
pointer allocate(size_type count, const_pointer /* hint */ = 0)
{
    if(count > max_size()){throw std::bad_alloc();}
        return static_cast<pointer>(oheap::get()->malloc(count * sizeof(type)));
}

// Delete memory
void deallocate(pointer ptr, size_type /* count */)
{
    oheap::get()->free(ptr);
}

The project provides the required components of the implementation, consisting of:

  • the observer thread (oheap.cpp)
  • the memory manager (vheap.cpp)
  • the file memory mapping (mmap.cpp)
  • the stl allocator interface (allocator.h)

In case one wants to make general use of the observed malloc routines in its application, the global new and delete operators need to be implemented.

C++
void * operator new(std::size_t n)
{
    return mallocfct(n);
}
void operator delete(void * p) throw()
{
    movetofree(p);
}

void *operator new[](std::size_t s)
{
    return mallocfct(s);
}
void operator delete[](void *p) throw()
{
    movetofree(p);
}

Background

Readers of this article should have basic knowledge of C++11, the use of the STL container template library and understand principles of threading.

Using the Code

The std::allocator can be simply used as additional argument in the parameter list of standard containers. The heap template argument refers to the observed memory manager.

C++
//
// std::multiset sample
//

#include <allocator.h>
#include <set>

int main()
{
    std::multiset<Example, std::less<Example>, allocator<Example, heap<Example> > > foo;

    for (int iter = 0; iter < 1000000000; ++iter)
    {
        foo.insert(Example(iter + 3));
        foo.insert(Example(iter + 1));
        foo.insert(Example(iter + 4));
        foo.insert(Example(iter + 2));
    }

    for (std::multiset<Example, std::less<Example>, 
         allocator<Example, heap<Example> > >::const_iterator iter(foo.begin()); 
         iter != foo.end(); ++iter)
    {
        ;
    }

    return 0;
}

For your local installation, make sure to set the VFILE_NAME #define to refer to a writeable folder of sufficient size on your system. The maximum allocatable memory block is defined by the VFILE_SIZE #define. Please consider that in the current implementation, no memory alignment is taking place.

The attached project provides the required mmapallocator.dll library.

Special Credits

  1. Dr. Thomas Chust - File memory mapping layer and analysation of basic memory management behaviour
  2. Pritam Zope - Providing the basic outline for the sbrk memory manager implementation
  3. Joe Ruether - Sophisticated template implementation of the std::allocator

Points of Interest

There are major limitations in the virtual memory management of recent versions of Windows with regards to handling processes with large memory consumption.

History

  • 18th May, 2020: Initial version
  • 21th May, 2020: Fixes in the observer thread
  • 24th May, 2020: Dynamically enlarge vmmap file sizes

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Germany Germany
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
BugDoesn't work with <vector> Pin
Paul Tait22-May-20 19:39
Paul Tait22-May-20 19:39 
GeneralRe: Doesn't work with <vector> Pin
stefan stammberger22-May-20 21:35
stefan stammberger22-May-20 21:35 
GeneralRe: Doesn't work with <vector> Pin
Paul Tait22-May-20 23:28
Paul Tait22-May-20 23:28 
GeneralRe: Doesn't work with <vector> Pin
stefan stammberger23-May-20 0:13
stefan stammberger23-May-20 0:13 
GeneralRe: Doesn't work with <vector> Pin
Paul Tait27-May-20 22:36
Paul Tait27-May-20 22:36 
GeneralRe: Doesn't work with <vector> Pin
stefan stammberger4-Jun-20 19:36
stefan stammberger4-Jun-20 19:36 
GeneralMy vote of 5 Pin
Jan Heckman22-May-20 0:39
professionalJan Heckman22-May-20 0:39 
QuestionBuggy and of questionable use... Pin
Mike Diack19-May-20 3:50
Mike Diack19-May-20 3:50 
QuestionAre you writing everything to the page file? Pin
Alois Kraus18-May-20 9:27
Alois Kraus18-May-20 9:27 
AnswerRe: Are you writing everything to the page file? Pin
Paul Tait19-May-20 18:37
Paul Tait19-May-20 18:37 
GeneralRe: Are you writing everything to the page file? Pin
Alois Kraus19-May-20 19:03
Alois Kraus19-May-20 19:03 
The saw tooth pattern happens when your working set was trimmed, or you did unmap a larger portion of the data. Yes this is a nice technique but you have to be aware that you are now using OS facilities which can affect other applications. If you have full control over the machine then this is fine.

I do not think that context switches due to file IO were your problem. You can do several millions of them per/s. You did not read the file byte wise I guess but in e.g. 1-2 MB chunks? What you will see if you read the file into your process at once e.g. a 1 GB file that it can take 1s to load. But if the file is already in the file system cache you are not paying for the file read but the soft page faults of your 1 GB allocated buffer which is soft faulted into your process working set. With memory mapped files you are not paying that price at once but only when you access that mapped memory (4K) page.
GeneralRe: Are you writing everything to the page file? Pin
Paul Tait20-May-20 18:34
Paul Tait20-May-20 18:34 
QuestionExcellent. +5 Pin
honey the codewitch18-May-20 8:49
mvahoney the codewitch18-May-20 8:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.