|
Mircea Neacsu wrote: you can allocate all the available memory and run a memory manager inside your app that allocates chunks of this pool
The trouble here is determining the pool size up front without just making it the size of available memory. As I mentioned it varies wildly application to application and content to content.
One thing I am doing is playing hot potato with my RAM to avoid fragmentation. I do not keep little allocations around.
Allocations that are kept around are allocated up front. I think that's what saves me here.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: without just making it the size of available memory My point was that you can make it the size of available memory. What else is going to use it? Can user run a 2nd app behind your back?
And if I'm right, and you make it the size of available RAM and manage it inside your app, how is this different from allocating it only when needed? The only difference would be that you use your own memory manager instead of the system one. Is your memory manager going to be better than the standard one? Probably only marginally so.
Mircea
|
|
|
|
|
It depends on the platform. The ESP-IDF for example, likes to temporarily allocate to do just about everything.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: The trouble here is determining the pool size up front without just making it the size of available memory. One way around this is to allocate in slabs, adding a new slab when needed. A small system might allocate one slab, whereas a bigger one might allocate, say, four.
honey the codewitch wrote: One thing I am doing is playing hot potato with my RAM to avoid fragmentation. I do not keep little allocations around. If the memory manager uses buddy allocation and you allocate temporary memory and then free it all before allocating more, fragmentation should be avoided altogether.
honey the codewitch wrote: Allocations that are kept around are allocated up front. An excellent strategy that is also important in systems where latency must be predictable.
|
|
|
|
|
Greg Utas wrote: One way around this is to allocate in slabs
Right, but you're still allocating. Initially I didn't realize the issue was strictly fragmentation. My TTF rasterizer already uses a custom pool/heap for allocations.
Greg Utas wrote: If the memory manager uses buddy allocation and you allocate temporary memory and then free it all before allocating more, fragmentation should be avoided altogether.
Typically I allocate for the span of an operation in order to improve performance. Sometimes I fallback to a slower method if I can't allocate, so it gets used like a cache. It gets freed once the operation completes.
This is the case in every situation in my graphics library save a couple of very explicit things like "large bitmaps" which are actually composed of a bunch of smaller bitmaps because of heap fragmentation, and things like SVG documents (which are actually kind of small).
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Greg Utas wrote: If the memory manager uses buddy allocation and you allocate temporary memory and then free it all before allocating more, fragmentation should be avoided altogether. Every time I mention buddy allocation, someone stand up, shouting about internal fragmentation.
That usually goes into a lengthy discussion about the fraction of allocations being 2-powers, how much you should allow yourself to fine tune memory use by, say, changing a long to an int to bring the allocation size down from 516 to 512, and so on. Usually, the shouters end up admitting that an average of 25% internal fragmentation is overly pessimistic.
Buddy allocation is certainly not very much valued among my professional friends. I love it.
|
|
|
|
|
I think one reason buddy allocation originally became popular was its very low management overhead. There may be better schemes, but they would use more management overhead to reduce fragmentation and would need a background task (like a garbage collector) to merge adjacent free areas, something that buddy allocation does on the fly.
|
|
|
|
|
I'd prefer not to do buddy recombination on the fly, i.e. when allocating/deallocating a block. I want neither allocation nor deallocation to be delayed by block recombination.
If you run out of space for bigger blocks, the freelists for smaller blocks may be long. To find buddies fit for combination is similar to sorting the list. It would be inefficient to go to the first smaller list (below the requested size) for the search; it may have viable candidates, but one block's buddy may live as two not-yet-combined halves in the second smaller list. So recombination should start at the smallest unit, and proceed upwards.
This you can leave to a very low priority task. The task unhooks a freelist for a given size, possibly leaving a couple entries for other threads' requests while the combination is being done. The task sorts the unhooked freelist to make combinable buddies list neighbors. If the heap start address is a multiple of the largest block that can be allocated, an XOR between the neighbors' addresses leaves the single bit that is the block size being processed; that is super fast. The two are unlinked, and the lower put into a separate list. Forget the upper one from now on - it is merged into the lower. When reaching the end of the list, the task is left with two lists: The reduced one (of the size being processed) and a list of recombined next size blocks, to be added to that freelist.
At the start, the task must reserve one freelist head while unhooking; that is done in a couple of instructions. At the cleanup, it must again reserve the freelist head to hook the reduced list back in, and then add its entries to the next bigger freelist. Both operations are done in a couple of instructions. There is no hurry: Noone is hurt if the task must wait in line before getting to add its entry to the two list heads. The order doesn't matter; if one head is busy, it can do the other one while it waits.
The task processes one size at a time. Each size usually has a moderate size freelist, so the time spent doing the job is moderate. Doing a real list sort might be a good idea: With allocation done from the list head, after the combination, the lowest available block is allocated for the following requests. If a bigger buddy must be split, it will be the lowermost of those. This strategy tends to gather active heap blocks at lower addresses, leaving the higher addresses for larger blocks (without the risk of a few single small block allocations in the upper range 'polluting' a large heap area). A block release may be a block at a higher address, but at the next compaction, the sorting will move it further out it the freelist, reducing the chances of it being allocated anew.
For the sorting part: When the compaction task unlinks a freelist, ready to start the sorting, chances are that the last part of the list are the same entries as on the previous compaction. There are suitable list sorting algorithms of O(n) on a sorted list, so the cost is dominated by the number of new freelist entries - which is probably not very high, since those freed are the first to go out of the list again. If the list has no already sorted tail, it indicates that the list has been all empty in the meantime, and probably hasn't grown that big yet.
If you let this low priority recombination task iterate over available block sizes, from bottom up, you will maintain a distinct locality of the heap use, towards lower addresses. The iteration frequency may be self-tuning: If the task found no more than, say, two combinable buddy pairs in any list, it may increase its delay before the next iteration by 50%. If it found sixteen pairs or more in any list, it may half its delay.
Given a background task for recombining buddies, allocation can be done in a couple instruction from a non-empty freelist, adding a small handful of instructions for each level it must move upwards if the list is empty. Release is (always) done in a couple of instructions.
Obviously: If you really run out of heap space - even the freelist for the largest block size is empty - then the recombination task must be force activated to combine buddies from the smallest up to the biggest block size. If even that fails, then you are really out of heap. If it succeeds, take it as a hint that the task should be run at a higher frequency!
|
|
|
|
|
I do the splitting and recombining during allocations and deallocations. It's fairly efficient, and the code already has the heap mutex. A background task has to lock out users for longer. I just make all users share the cost, because I can't see how offloading the work to a background task can make things any more efficient.
|
|
|
|
|
It all depends on OS/hardware context. "Background" doesn't always mean a background task in the OS sense.
You can offer your applications a heap manager providing allocate/deallocate as intrinsics / inline functions working on freelists with operations synchronized on hardware reservation (available in several architectures), with need for software semaphores. That is all your applications see.
Behind the scenes, your heap manager can have a clock interrupt handler doing the recombination. Even if the clock handler must maintain a list of timed events (it probably has one already!), regular activation of a recombination task is magnitudes cheaper than using OS mechanisms to start a background process in the dotNet sense, or of most other OSes.
Your apps won't know the reason why memory is so tidy all the time. They will get the memory blocks they ask for, in a small handful of instructions; release goes in even less. Your apps are never delayed by any (lengthy?) recombination of buddies. The risk of heap pollution from small memory allocation blocking large recombinations is greatly reduced.
You do not provide details about your procedures: When deallocating, do you search for buddy blocks for recombination of the same size only, or do you recurse to larger blocks, possibly up to the maximum size available? Do you keep freelists sorted for efficient identification of buddies, or are they unsorted?
Corollary: On new allocations, will you allocate new blocks minimizing the pollution of the heap by small blocks preventing allocation of large blocks? This is more or less implicit with sorted freelists, but if your freelists are not sorted, how do you maintain it?
|
|
|
|
|
Most memory allocated at run-time is actually taken from object pools[^] allocated when the system initializes. Each pool supports a major framework class, and the number of blocks in each pool is engineered based on the system's object model.
The buddy heap[^] is seldom used at run-time. It is used to allocate byte buckets for IP message payloads or when using the framework to implement a standalone application. Most allocation occurs in the pools.
The buddy heap[^] has a queue for its block sizes (powers of 2). Allocation can recursively split a block, and deallocation can recursively merge blocks.
Both memory managers verify that a pointer references a valid block, support debug tools that can be enabled to trace activity, and collect usage statistics.
|
|
|
|
|
I do not use dynamic memory allocation on my systems. Anyway, most of my targets 'feature' 4 KB of SRAM , so malloc would be overkill.
I am interested in trying dynamic allocation on larger system (embedding Lua , for example, would require that), but never did before.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
For some reason I simply can't imagine calling malloc on a 4kB system.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
By default, the IDE (PSoC Creator) sets the heap size to 128 bytes.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
|
|
AFAIK there is no reflection in C++. You cannot have something like:
struct S{
int value;
double number;
}
S to_serialize{1, 3.14};
std::string tag;
stream strm;
for (auto& m: members_of(S))
{
strm << name_of(m) << ': ';
strm << to_serialize.m << std::endl;
} In a language with reflection this would produce an output like:
value: 1
number: 3.14
Mircea
|
|
|
|
|
Mircea Neacsu wrote: AFAIK there is no reflection in C++.
Ok? That of course makes sense given that the link I posted is a about a new feature that will be added to the next standard.
Kind of silly to add it if it already existed.
|
|
|
|
|
RTTI is the only metadata functionality I'm aware of in C++.
There are hacks to sort of divine some reflective type information at compile time like SFINAE but not at runtime excepting RTTI i think.
But then the only thing I really have worked with is C++17 and prior, so if they have introduced something besides since I wouldn't know.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: RTTI
Yeah that is the one that I was remembering.
So then that is not sufficient perhaps? Limited in some way?
In Java/C# one can get to things like class name, method names (public and private), method parameters (types), access types, class attributes. All of those can be listed and manipulated.
So for example if a class attribute is private and modifying that is going to help with better unit testing one can write reflection code to modify it. That is the most common use I have used for accessing private attributes. I don't use it that much and seems like bit of a wash versus adding a setter/getter.
|
|
|
|
|
RTTI is extremely limited last time I checked. Given I've maybe used it once to play around with it years ago and that's it, so don't quote me. It's nothing like what .NET offers, for example.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Ok. But a brief reading of the link I posted suggests that will changing.
|
|
|
|
|
Personally, I'd probably still avoid it unless it solved a very significant problem for me because I like being able to make 17KB dlls that are actually useful.
Metadata ain't free.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: I'd probably still avoid it unless it solved a very significant problem
Specific times I have used it in other languages.
- Dynamic loading of classes. Especially when parameter lists are dynamic.
- Messing with private data of a class. Most the time for unit tests, but I think one time it was in the production code. In C++ there is a way to do that anyways but it always risky and just as hacky.
honey the codewitch wrote: Metadata ain't free.
Which is why it has taken so long I suspect.
|
|
|
|
|
I am asking a question about regular expression.
I am NOT looking for references to AI regular expression generators.
Here in my text to find matches of ALL "hci" in it:
"Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n"
Here is my attempt to accomplish that task
QRegularExpression re("(hci[0-9]+)");
It matches only first occurrence of hci1" and twice
Please follow the established CodeProject instruction and read the post.
PLEASE no references to AI regular expression generators.
I need to learn TO BUILD regular repression - specially how to write it so it will attempt to match multiple items.
PS I did try to use "global match" , no go.
|
|
|
|