|
CMake is integrated with VS now. Migrating to it (about 20 libraries) wasn't anywhere near as hard as I anticipated, and builds are much faster under Ninja. If you're interested, here's[^] an overview.
|
|
|
|
|
I'm working in an antediluvian organisation.
Eventually we'll do it. (or not)
CI/CD = Continuous Impediment/Continuous Despair
|
|
|
|
|
Every organization is antediluvian!
|
|
|
|
|
Project dependencies can be changed by going to the Solution properties (right click on the solution explorer root of the tree)
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
When I read the OP I was also wondering if they were referring to what you said or perhaps they thought the compiler would just figure it out.
|
|
|
|
|
I'm not discussing this on r/embedded for fear of starting a flame war.
For those of you that do embedded, you're probably told to not dynamically allocate memory, but rather to allocate up front.
I cheat.
The problem is with things like TrueType and SVG depending on the complexity of the content the memory requirements can vary wildly application to application, and firmware is difficult (but not necessarily impossible) to adequately profile on the embedded hardware itself.
These features are typically used on systems with at least 360KB of RAM but will run in as little as 192KB that I've tried, at least for some applications (I haven't gotten exhaustive metrics on the memory usage of these things)
What I've done is dynamically allocate in these situations, and then test my app, usually for days to make sure it's stable. I've also run this graphics code (which I use in all my projects) through its paces with deleaker so I'm confident in it.
I could use a custom pool and allocate from that, but I'd still have to know how much TTF for example, would use up front for that given font - which requires instrumenting the application, which costs money.
One advantage to dynamic allocation is my fonts for example, only use it while rendering. It is therefore possible to (while not rendering) use that memory for other (temporary) things. I rarely do this in practice, but it's definitely doable, if you're careful about fragmentation. In these cases, you have to test thoroughly, but again, doable.
How terrible is this approach? I'm self taught and was sort of drafted into embedded so I'm unsure of myself here.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: How terrible is this approach? I'm self taught and was sort of drafted into embedded so I'm unsure of myself here. Not so terrible, rest assured; don't damage your self-esteem
The biggest issue with dynamic allocation, and where this mantra of "no mallocs" came, is the potential for memory fragmentation. You don't know how well the memory manager behaves and you risk running out of big memory chunks.
However, if you think about it, this is a phenomena specific to multi-tasking, where each task is oblivious of other tasks that may exist in the system. This is not what happens in an embedded system where you have only one "task" (footnote: yes, there are exceptions to this scenario). You certainly have multiple threads but those threads cooperate, not compete for resources.
If you would have an anal boss that wants to enforce the "no malloc" rule, you can allocate all the available memory and run a memory manager inside your app that allocates chunks of this pool. On the outside the system would behave the same way because there is only one task/app. Again, I'm over-simplifying here: there is still a network stack that needs some buffers but you get the gist. In embedded world, quoting Taylor Swift: "This is our place, we make the call".
Mircea
|
|
|
|
|
Mircea Neacsu wrote: you can allocate all the available memory and run a memory manager inside your app that allocates chunks of this pool
The trouble here is determining the pool size up front without just making it the size of available memory. As I mentioned it varies wildly application to application and content to content.
One thing I am doing is playing hot potato with my RAM to avoid fragmentation. I do not keep little allocations around.
Allocations that are kept around are allocated up front. I think that's what saves me here.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: without just making it the size of available memory My point was that you can make it the size of available memory. What else is going to use it? Can user run a 2nd app behind your back?
And if I'm right, and you make it the size of available RAM and manage it inside your app, how is this different from allocating it only when needed? The only difference would be that you use your own memory manager instead of the system one. Is your memory manager going to be better than the standard one? Probably only marginally so.
Mircea
|
|
|
|
|
It depends on the platform. The ESP-IDF for example, likes to temporarily allocate to do just about everything.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: The trouble here is determining the pool size up front without just making it the size of available memory. One way around this is to allocate in slabs, adding a new slab when needed. A small system might allocate one slab, whereas a bigger one might allocate, say, four.
honey the codewitch wrote: One thing I am doing is playing hot potato with my RAM to avoid fragmentation. I do not keep little allocations around. If the memory manager uses buddy allocation and you allocate temporary memory and then free it all before allocating more, fragmentation should be avoided altogether.
honey the codewitch wrote: Allocations that are kept around are allocated up front. An excellent strategy that is also important in systems where latency must be predictable.
|
|
|
|
|
Greg Utas wrote: One way around this is to allocate in slabs
Right, but you're still allocating. Initially I didn't realize the issue was strictly fragmentation. My TTF rasterizer already uses a custom pool/heap for allocations.
Greg Utas wrote: If the memory manager uses buddy allocation and you allocate temporary memory and then free it all before allocating more, fragmentation should be avoided altogether.
Typically I allocate for the span of an operation in order to improve performance. Sometimes I fallback to a slower method if I can't allocate, so it gets used like a cache. It gets freed once the operation completes.
This is the case in every situation in my graphics library save a couple of very explicit things like "large bitmaps" which are actually composed of a bunch of smaller bitmaps because of heap fragmentation, and things like SVG documents (which are actually kind of small).
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Greg Utas wrote: If the memory manager uses buddy allocation and you allocate temporary memory and then free it all before allocating more, fragmentation should be avoided altogether. Every time I mention buddy allocation, someone stand up, shouting about internal fragmentation.
That usually goes into a lengthy discussion about the fraction of allocations being 2-powers, how much you should allow yourself to fine tune memory use by, say, changing a long to an int to bring the allocation size down from 516 to 512, and so on. Usually, the shouters end up admitting that an average of 25% internal fragmentation is overly pessimistic.
Buddy allocation is certainly not very much valued among my professional friends. I love it.
|
|
|
|
|
I think one reason buddy allocation originally became popular was its very low management overhead. There may be better schemes, but they would use more management overhead to reduce fragmentation and would need a background task (like a garbage collector) to merge adjacent free areas, something that buddy allocation does on the fly.
|
|
|
|
|
I'd prefer not to do buddy recombination on the fly, i.e. when allocating/deallocating a block. I want neither allocation nor deallocation to be delayed by block recombination.
If you run out of space for bigger blocks, the freelists for smaller blocks may be long. To find buddies fit for combination is similar to sorting the list. It would be inefficient to go to the first smaller list (below the requested size) for the search; it may have viable candidates, but one block's buddy may live as two not-yet-combined halves in the second smaller list. So recombination should start at the smallest unit, and proceed upwards.
This you can leave to a very low priority task. The task unhooks a freelist for a given size, possibly leaving a couple entries for other threads' requests while the combination is being done. The task sorts the unhooked freelist to make combinable buddies list neighbors. If the heap start address is a multiple of the largest block that can be allocated, an XOR between the neighbors' addresses leaves the single bit that is the block size being processed; that is super fast. The two are unlinked, and the lower put into a separate list. Forget the upper one from now on - it is merged into the lower. When reaching the end of the list, the task is left with two lists: The reduced one (of the size being processed) and a list of recombined next size blocks, to be added to that freelist.
At the start, the task must reserve one freelist head while unhooking; that is done in a couple of instructions. At the cleanup, it must again reserve the freelist head to hook the reduced list back in, and then add its entries to the next bigger freelist. Both operations are done in a couple of instructions. There is no hurry: Noone is hurt if the task must wait in line before getting to add its entry to the two list heads. The order doesn't matter; if one head is busy, it can do the other one while it waits.
The task processes one size at a time. Each size usually has a moderate size freelist, so the time spent doing the job is moderate. Doing a real list sort might be a good idea: With allocation done from the list head, after the combination, the lowest available block is allocated for the following requests. If a bigger buddy must be split, it will be the lowermost of those. This strategy tends to gather active heap blocks at lower addresses, leaving the higher addresses for larger blocks (without the risk of a few single small block allocations in the upper range 'polluting' a large heap area). A block release may be a block at a higher address, but at the next compaction, the sorting will move it further out it the freelist, reducing the chances of it being allocated anew.
For the sorting part: When the compaction task unlinks a freelist, ready to start the sorting, chances are that the last part of the list are the same entries as on the previous compaction. There are suitable list sorting algorithms of O(n) on a sorted list, so the cost is dominated by the number of new freelist entries - which is probably not very high, since those freed are the first to go out of the list again. If the list has no already sorted tail, it indicates that the list has been all empty in the meantime, and probably hasn't grown that big yet.
If you let this low priority recombination task iterate over available block sizes, from bottom up, you will maintain a distinct locality of the heap use, towards lower addresses. The iteration frequency may be self-tuning: If the task found no more than, say, two combinable buddy pairs in any list, it may increase its delay before the next iteration by 50%. If it found sixteen pairs or more in any list, it may half its delay.
Given a background task for recombining buddies, allocation can be done in a couple instruction from a non-empty freelist, adding a small handful of instructions for each level it must move upwards if the list is empty. Release is (always) done in a couple of instructions.
Obviously: If you really run out of heap space - even the freelist for the largest block size is empty - then the recombination task must be force activated to combine buddies from the smallest up to the biggest block size. If even that fails, then you are really out of heap. If it succeeds, take it as a hint that the task should be run at a higher frequency!
|
|
|
|
|
I do the splitting and recombining during allocations and deallocations. It's fairly efficient, and the code already has the heap mutex. A background task has to lock out users for longer. I just make all users share the cost, because I can't see how offloading the work to a background task can make things any more efficient.
|
|
|
|
|
It all depends on OS/hardware context. "Background" doesn't always mean a background task in the OS sense.
You can offer your applications a heap manager providing allocate/deallocate as intrinsics / inline functions working on freelists with operations synchronized on hardware reservation (available in several architectures), with need for software semaphores. That is all your applications see.
Behind the scenes, your heap manager can have a clock interrupt handler doing the recombination. Even if the clock handler must maintain a list of timed events (it probably has one already!), regular activation of a recombination task is magnitudes cheaper than using OS mechanisms to start a background process in the dotNet sense, or of most other OSes.
Your apps won't know the reason why memory is so tidy all the time. They will get the memory blocks they ask for, in a small handful of instructions; release goes in even less. Your apps are never delayed by any (lengthy?) recombination of buddies. The risk of heap pollution from small memory allocation blocking large recombinations is greatly reduced.
You do not provide details about your procedures: When deallocating, do you search for buddy blocks for recombination of the same size only, or do you recurse to larger blocks, possibly up to the maximum size available? Do you keep freelists sorted for efficient identification of buddies, or are they unsorted?
Corollary: On new allocations, will you allocate new blocks minimizing the pollution of the heap by small blocks preventing allocation of large blocks? This is more or less implicit with sorted freelists, but if your freelists are not sorted, how do you maintain it?
|
|
|
|
|
Most memory allocated at run-time is actually taken from object pools[^] allocated when the system initializes. Each pool supports a major framework class, and the number of blocks in each pool is engineered based on the system's object model.
The buddy heap[^] is seldom used at run-time. It is used to allocate byte buckets for IP message payloads or when using the framework to implement a standalone application. Most allocation occurs in the pools.
The buddy heap[^] has a queue for its block sizes (powers of 2). Allocation can recursively split a block, and deallocation can recursively merge blocks.
Both memory managers verify that a pointer references a valid block, support debug tools that can be enabled to trace activity, and collect usage statistics.
|
|
|
|
|
I do not use dynamic memory allocation on my systems. Anyway, most of my targets 'feature' 4 KB of SRAM , so malloc would be overkill.
I am interested in trying dynamic allocation on larger system (embedding Lua , for example, would require that), but never did before.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
For some reason I simply can't imagine calling malloc on a 4kB system.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
By default, the IDE (PSoC Creator) sets the heap size to 128 bytes.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
|
|
AFAIK there is no reflection in C++. You cannot have something like:
struct S{
int value;
double number;
}
S to_serialize{1, 3.14};
std::string tag;
stream strm;
for (auto& m: members_of(S))
{
strm << name_of(m) << ': ';
strm << to_serialize.m << std::endl;
} In a language with reflection this would produce an output like:
value: 1
number: 3.14
Mircea
|
|
|
|
|
Mircea Neacsu wrote: AFAIK there is no reflection in C++.
Ok? That of course makes sense given that the link I posted is a about a new feature that will be added to the next standard.
Kind of silly to add it if it already existed.
|
|
|
|
|