Hy,
I'm interested to a safe sw architecture based on FreeRTOS on a TMS570 for a safety application,
by my point of view for a safety application it is better to use the static version of task, queue, ..
xQueueCreateStatic
xTaskCreateStatic
xTimerCreateStatic
.. and so on
do you agree on this or you think I could use also the non static version?
Thanks
Antonio
As you imply, FreeRTOS gives you both options. Some of the pros and cons of both are listed on the following page, so I won't repeat them here: http://www.freertos.org/Static_Vs_Dynamic_Memory_Allocation.html
Only you can answer your question though, as it depends on what your application is doing. The answer would be very different between whether your application is creating all its resources at boot time and then never deleting them, or if it is continuously creating and deleting lots of different sized memory blocks at run time. As this is a safety application I would be surprised if it was doing lots of memory allocation and deallocation at run time - but if it were you would have to concern yourself with memory fragmentation, dealing with heap exhaustion, and non deterministic behaviour of the memory allocater (not ever allocation takes the same time). Using heap_4.c in FreeRTOS should prevent fragmentation in most cases, but 'should' is probably not good enough for a safety application.
If all the resources are allocated dynamically, but only at boot time (which is actually the case for most application), then none of these potential pitfalls apply, and there is really no reason not to use dynamic memory.
Related
Say for instance I write a program which allocates a bunch of large objects when it is initialized. Then the program runs for awhile, perhaps indefinitely, and when it's time to terminate, each of the large initialized objects are freed.
So my question is, will it take longer to manually deallocate each block of memory separately at the end of the program's life or would it be better to let the system unload the program and deallocate all of the virtual memory given to the program by the system at the same time.
Would it be safe and/or faster? Also, if it is safe, does the compiler do this when set to optimise anyway?
1) Not all systems will free a memory for you when application terminates. Of course most of the modern desktop systems will do this, so if you are going to run your program only on Linux or Mac(or Windows), you can leave the deallocation to the system.
2) Often it is needed to make some operations with the data on termination, not just to free the memory. So if you are going to develop such program design that makes it hard to deallocate objects at the end manually, then it can happen that later you will need to perform some code before exiting and you will face up with hard problem.
2') Sometimes even if you think that your program will need some objects all the way until dead, later you may want to make a library from you program or change a project to load and unload you big objects and the poor design of your program will make this hard or impossible.
3) Moreover, the program deallocation performance depends on the implementation of the allocator you are going to use in your program. The system deallocation depends on the system memory management and even for a single system there can be several implementations. So if you face with allocation/deallocation performance problems - you would like to develop better allocator rather then hope on the system.
4) So my opinion is: When you deallocate memory manually at the end - you are always on a right way. When you don't do this, perhaps you can get some ambiguous benefits in several cases, but likely you will just face with the problems sooner or later.
Well most OS will free the memory at exit if the program, but the bigger question is why would you want it to have to?
Is it faster? Hard to say with memory sometimes. I would guess not really and definitely not worth breaking good coding practices anyway.
Is it safe? Define safe... Will your OS crash? Probably not. Will your code be susceptible to memory leaks or other problems? Absolutely, it will. In fact you are basically telling it you want memory leaks.
Best practice is to always free your memory when you are done with it. With C and C++, every malloced or new block of memory should have a corresponding free or delete.
It is a bad idea to rely on the OS to free your memory because it not only makes your code look bad and makes it less portable, but if the program was ever integrated into another program, then you will likely be tracking down memory leaks for hours.
So, short answer, always do it manually.
Programs with a short maintenance life time are good candidates for memory deallocation by "exit() and let the kernel sort them out." However, if the program will last more than a few months you have to consider the maintenance burden.
For instance, consider that someone may realize that a subsequent stage is required in the program, and some of the data is not needed, or not needed in memory. They now have to go and find out how to deallocate the memory, properly removing stale references, etc.
I believe that you can't force a running Memcached instance to de-allocate memory, short of terminating that Memcached instance (and freeing all of the memory it held). Does anyone know of a definitive piece of documentation, or even a mailing list or blog posting from a reliable source, that can confirm or deny this impression?
As I understand it, a Memcached process initially allocates a chunk of memory (the exact initial allocation size is configurable), and then monotonically increases its memory utilization over its lifetime, limited by the daemon's maximum memory allocation size (also configurable). At no point does the Memcached daemon ever free any memory, regardless of whether the daemon has any ongoing need for the memory it holds.
I know that this question might sound a little whiny, with a tone of "I DEMAND that open source project X support my specific need!" That's not it, at all--I'm purely interested in the exact technical answer, here, and I swear I'm not harshing on Memcached. For the curious, this question came out of a discussion about possible methods for gracefully juggling multiple Memcached instances on a single server, given an application where the cost of a cache flush can be quite high.
However, I'd appreciate it if you save your application suggestions/advice for a different question (re-architecting my application, using a different caching implementation, etc.). I do appreciate a good brainstorm, but I think this question will be most valuable if it stays focused on the technical specifics of how Memcached does and does not work. If you don't have the answer to this specific question, there is probably still value in what you have to say, but I'd guess that there's a different, better place to post the more speculative comments/suggestions/advice.
This is probably the hardest problem we have to solve for memcached currently (well, a variation of it, anyway).
Freeing a chunk of memory requires us to know that a) nothing within the chunk is in use and b) nothing will start using it while we're in the process of purging it for reuse/freeing. I've heard some really good ideas for how we might solve our slab rebalancing problems which is basically the same, except we're not trying to free the memory, but to give it to something else (a common problem in a few large installations).
Also, whether free actually reduces the RSS of your process is implementation dependent. In many cases, a malloc/fill/free will leave the memory mapped in (unless your allocator uses mmap instead of sbrk).
I'm pretty sure this isn't possible with memcached. I don't see any technical reason why it couldn't be implemented though. Lock cache operations, expire enough keys to reach the desired size, update the size, unlock. (I'm sure there's nicer ways to avoid blocking the server during that time.)
The standard and default mechanism of memory management in memcached is slab allocator. It means that memory is being allocated for the process and never released to the operating system. Basically, when memory is no longer used to store some data, it is being held by the process in order to be reused later, when needed. However, the operating system releases memory allocated by the process when it is finished. That is why memory is being released when you kill/stop the memcached.
There is a compile-time option in memcached to enable malloc/free mechanism. So that when free() is called, memory might be released to operating system (this depends on C standard library implementation). But doing so might hurt a good fragmentation and performance.
Please read more about the issue here:
Why not use malloc/free
Memcached memory management
A faithful implementation of the actor message-passing semantics means that message contents are deep-copied from a logical point-of-view, even for immutable types. Deep-copying of message contents remains a bottleneck for implementations the actor model, so for performance some implementations support zero-copy message passing (although it's still deep-copy from the programmer's point-of-view).
Is zero-copy message-passing implemented at all in Erlang? Between nodes it obviously can't be implemented as such, but what about between processes on the same node? This question is related.
I don't think your assertion is correct at all - deep copying of inter-process messages isn't a bottleneck in Erlang, and with the default VM build/settings, this is exactly what all Erlang systems are doing.
Erlang process heaps are completely separate from each other, and the message queue is located in the process heap, so messages must be copied. This is also true for transferring data into and out of ETS tables as their data is stored in a separate allocation area from process heaps.
There are a number of shared datastructures however. Large binaries (>64 bytes long) are generally allocated in a node-wide area and are reference counted. Erlang processes just store references to these binaries. This means that if you create a large binary and send it to another process, you're only sending the reference.
Sending data between processes is actually worse in terms of allocation size than you might imagine - sharing inside a term isn't preserved during the copy. This means that if you carefully construct a term with sharing to reduce memory consumption, it will expand to its unshared size in the other process. You can see a practical example in the OTP Efficiency Guide.
As Nikolaus Gradwohl pointed out, there was an experimental hybrid heap mode for the VM which did allow term sharing between processes and enabled zero-copy message passing. It hasn't been a particularly promising experiment as I understand it - it requires extra locking and complicates the existing ability of processes to independently garbage collect. So not only is copying inter-process messages not the usual bottleneck in Erlang systems, allowing it actually reduced performance.
AFAIK there was/is experimental support for zero-copy message-passing in erlang using the -shared or -hybrid modell. I read a blog post in 2009 claiming that it's broken on smp machines, but I have no idea about the current status
As has been mentioned here and in other questions current versions of Erlang basically copy everything except for larger binaries. In older pre-SMP times it was feasible to not copy but pass references. While this resulted in very fast message passing it created other problems in the implementation, primarily it made garbage collection more difficult and complicated implementation. I think that today passing references and having shared data could result in excessive locking and synchronisation which is, of course, not a Good Thing.
I wrote the accepted answer to that other question you're referencing, and in it I give you a direct pointer to this line of code:
message = copy_struct(message, msize, &hp, &bp->off_heap);
This is in a function called when the Erlang run-time system needs to send a message, and it's not inside any kind of "if" that could cause it to be skipped. So, as far as I can tell, the answer is "yes, it's always copied." (That's not strictly true -- there is an "if", but it seems to be dealing with exceptional cases, not the normal code-flow path.)
(I'm ignoring the hybrid heap option brought up by Nikolaus. It looks like he's right, but since this isn't the way Erlang is normally built and it has its own penalties, I don't see that it's worth considering as a way to answer your concern.)
I don't know why you're considering 10 GByte/sec a bottleneck, though. Nothing short of registers or CPU cache goes faster in the computer, and such memories are small, thus constituting a kind of bottleneck themselves. Besides which, the zero-copy idea you're proposing would require locking in the case of cross-CPU message passing in a multi-core system, which is also a bottleneck. We're already paying the locking penalty once in this function to copy the message into the other process's message queue; why pay it again later when that process gets around to reading the message?
Bottom line, I don't think your ideas of ways to make it go faster would actually help much.
I was going through some of the decisions made to make Xara Xtreme, an open source SVG graphics application. Their memory management decision was quite intriguing to me since I naively took it for granted that on-demand dynamic allocation as the way of writing object oriented application.
The explanation from the documentation is
How on earth can static allocations be efficient?
If you are used to large dynamic data structures, this may seem strange
to you. Firstly, all our objects (and
thus allocation size) are far smaller
(on average) than each dynamic area
allocation within a program such as
Impression. This means that though
there are likely to be many holes
within memory, they are small. Also,
we have far more allocated objects
within memory, and thus these holes
quickly get filled. Furthermore,
virtual memory managers will free up
any pages of memory that contain no
allocations and give this memory back
to the operating system so that it may
be used again (either by us, or by
another task).
We benefit greatly from
the fact that whenever we allocate
memory in this manner, we do not have
to move any memory about. This proved
a bottleneck in ArtWorks which also
had many small allocations being used
concurrently. more
In brief, the presence of plenty of small objects and the need to prevent memory move are the reasons given for choosing static allocation. I don't have clear understanding about the reasons mentioned.
Though this talks about static allocation, what I see from the cursory look at the code is that a block of memory is dynamically allocated at the application start and kept alive till the application ends, roughly simulating static allocation.
Could you explain in what situations Static Allocation fares better than on-demand Dynamic Allocation in order to consider it as the main mode of allocation in a serious applications?
It's quicker because you avoid the overhead of calling a system routine to manage your storage. malloc() maintains a heap, so every request requires a scan for an appropriately-sized block, possibly resizing the block, updating the block list to mark this block as used, etc. If you're allocating a lot of small objects, this overhead can be excessive. With static allocation you can create an allocation pool and just maintain a simple bitmap to show which areas are in use. This assumes that each object is the same size, so you commonly create one pool per object type.
In short, there's really no such thing as static allocation other than the space allocated for your functions themselves and other read-only kinds of memory. (Do an assemble-only "gcc -S" and look for all the memory blocks, if you're interested.) If you're making and breaking objects, you're dynamically allocating. That being said, there's nothing to stop you from tightly controlling the allocation mechanism itself.
That's what functions like mallinfo() and mallopt() do for controlling how malloc() does its magic. However, that might not even be good enough for you. If you know all your chunks are going to be the same size, you can allocate and deallocate much more efficiently. And if you know you have 3 sizes of stuff, you can keep 3 arenas of memory each with their own allocator.
On top of this, you have the situation at runtime where the process doesn't have enough room and needs to ask the os for more - that involves a system call that is more expensive than just incrementing an array index. On unix, it's usually brk() or sbrk() or the like. And that can take valuable time.
Another, rarer situation, would be if you need to multiply-allocate things. Like 3 threads need to share information and only when all 3 release it does it get freed. That's something nonstandard and not generally covered by typical mallopt() or even pthread-specific memory or mutex/semaphore-locked chunks.
So if you have high speed optimization issues or you are running on an embedded system where you need to squeeze all you can out of the available memory, then "static allocation", or at least controlling the allocation mechanism, may be the way to go.
In C/C++ I can allocate memory in one thread and delete it in another thread. Yet whenever one requests memory from the heap, the heap allocator needs to walk the heap to find a suitably sized free area. How can two threads access the same heap efficiently without corrupting the heap? (Is this done by locking the heap?)
In general, you do not need to worry about the thread-safety of your memory allocator. All standard memory allocators -- that is, those shipped with MacOS, Windows, Linux, etc. -- are thread-safe. Locks are a standard way of providing thread-safety, though it is possible to write a memory allocator that only uses atomic operations rather than locks.
Now it is an entirely different question whether those memory allocators scale; that is, is their performance independent of the number of threads performing memory operations? In most cases, the answer is no; they either slow down or can consume a lot more memory. The first scalable allocator in both dimensions (speed and space) is Hoard (which I wrote); the Mac OS X allocator is inspired by it -- and cites it in the documentation -- but Hoard is faster. There are others, including Google's tcmalloc.
Yes an "ordinary" heap implementation supporting multithreaded code will necessarily include some sort of locking to ensure correct operation. Under fairly extreme conditions (a lot of heap activity) this can become a bottleneck; more specialized heaps (generally providing some sort of thread-local heap) are available which can help in this situation. I've used Intel TBB's "scalable allocator" to good effect. tcmalloc and jemalloc are other examples of mallocs implemented with multithreaded scaling in mind.
Some timing comparisons comparisons between single threaded and multithread-aware mallocs here.
This is an Operating Systems question, so the answer is going to depend on the OS.
On Windows, each process gets its own heap. That means multiple threads in the same process are (by default) sharing a heap. Thus the OS has to thread-synchronize its allocation and deallocation calls to prevent heap corruption. If you don't like the idea of the possible contention that may ensue, you can get around it by using the Heap* routines. You can even overload malloc (in C) and new (in C++) to call them.
I found this link.
Basically, the heap can be divided into arenas. When requesting memory, each arena is checked in turn to see whether it is locked. This means that different threads can access different parts of the heap at the same time safely. Frees are a bit more complicated because each free must be freed from the arena that it was allocated from. I imagine a good implementation will get different threads to default to different arenas to try to minimize contention.
Yes, normally access to the heap has to be locked. Any time you have a shared resource, that resource needs to be protected; memory is a resource.
This will depend heavily on your platform/OS, but I believe this is generally OK on major sytems. C/C++ do not define threads, so by default I believe the answer is "heap is not protected", that you must have some sort of multithreaded protection for your heap access.
However, at least with linux and gcc, I believe that enabling -pthread will give you this protection automatically...
Additionally, here is another related question:
C++ new operator thread safety in linux and gcc 4