Castalia Memory Issue - memory

My application layer protocol works fine, but when the number of nodes is large (more than 600) it exits without any error.
I traced the code and didn't find any problem. It seems a memory problem since the number of nodes is large and doing many operations.
Update:
In my application:
Each node broadcasts 2msg/second, during all the simulation time.
The msgs contain much information related to my application.
All the nodes are static.
Using BypassRouting, BypassMAC, Radio cc2420.
Castalia works for nodes larger than 600 and reaches to 2500 from my previous experiments but with low simulation time ... so it depends on the relation between the # of nodes and simulation time and # of sent messages per second.
Single experiment run successfully... but when running for example with 30 seed (i.e. -r 30) ... & num of nodes = 110
its stopped after exp 13 simulation time = 1000s
& its stopped after exp 22 if simulation time = 600s
How I can free memory from unnecessary things during simulation runs.
(note: previously I increased the swap memory and worked for a specific limit)
Thanks,

Without more information on your application and the simulation scenario it's hard to provide very specific suggestions. At the very least, you could provide your ini file and information about any custom modules you are using (your application module for example). Are you using any mobile nodes for example? Which protocols are you using? What does you app module do? In general Castalia should be able to handle 600 nodes. In the past, we have tested Castalia with thousands of (static) nodes.
You could use a memory profiler. An excellent tool (a suite of tools really) is valgrind. You can find memory leaks, and you can also memory profile your program. The heap profiler tool of valgrind is called 'massif':
Massif is a heap profiler. It performs detailed heap profiling by taking regular snapshots of a program's heap. It produces a graph showing heap usage over time, including information about which parts of the program are responsible for the most memory allocations. The graph is supplemented by a text or HTML file that includes more information for determining where the most memory is being allocated. Massif runs programs about 20x slower than normal.
Read the valgrind documentation for more info. This is the way you invoke the tool:
valgrind --tool=massif <executable> <arguments>
The executable in this case is CastaliaBin (not the Castalia python script, which is a higher level execution tool).

Related

Erlang ETS memory fragmentation

I have an erlang cluster where erlang:memory() 'total' is between 2-2.5GB from idle to busy time, day in day out. ets memory usage is around 440M and stays around there no matter what. The data within ets is heavily transient, completely changes throughout the day. Tomorrows data is guaranteed to have no commonality to today's.
Linux top says beam is using like 10 gigabytes. free -m 'used' agrees with that (the machine really only runs beam). The overall memory usage of the system grows regularly, like 1% per day on 16GB systems. There is some variance across nodes, but not by alot, and OS 'used' memory is always several times more than erlang:memory() total.
erlang:system_info({allocator, ets_alloc}) shows 20 allocators. Most have data that looks something like this (full output of command is here):
{mbcs_pool,[{blocks,2054},
{blocks_size,742672},
{carriers,10},
{carriers_size,17825792}]},
1) Does this mean that 742K bytes (words?) of memory are actually taking 17M of OS memory?
2) As this post suggests, should we add '+MEas bf' to the VM args, in order to reduce overhead?
3) What else can I do to avoid actually running out of memory?
This is R17.5 but we will be migrating to R19.3 in next deployment (this week). We don't have recon in the current deployment but will be adding it in the next deployment. Also, can't imagine this matters, but beam is running inside an alpine container.
In case someone else runs into this later: this was not actually leaked memory.
The default memory allocator strategy of erlang may not be optimal for your use, depending what you do, and depending on how erlang is configured to allocate blocks. Turns out, in some cases, "free" memory from erlang point of view won't necessarily be immediately released to the OS due to allocator fragmentation.
It's somewhat explained here: http://erlang.org/doc/man/erts_alloc.html
The default allocator strategy for the version of erlang we used at the time is aoffcbf (address order first fit carrier best fit). In our case, this resulted in very high memory fragmentation (10+GB overhead worth). When troubleshooting these things, erlang:system_info(allocator) and erlang:system_info({allocator, Alloc}) are your friend. Changing to aobff (address order best fit) resulted in much more efficient memory usage. In truth, as long as the machine didn't run out of physical memory, it wouldn't matter, but for us, we were getting dangerously close to the physical limit. And you do not want to start paging. With aobff, we never passed 4GB, even after the node being up 18 months. With the aoffcbf we would pass 10GB in a few weeks.
As always, YMMV, as it all depends what type, size, etc.. of blocks are allocated, and how long they live.

Spreadsheet Gear -- Generating large report via copy and paste seems to use a lot of memory and processor

I am attempting to generate a large workbook based report with 3 supporting worksheets of 100,12000 and 12000 rows and a final output sheet all formula based that ends up representing about 120 entities at 100 rows a piece. I generate a template range and copy and paste it replacing the entity ID cell after pasting each new range. It is working fine but I noticed that memory usage in the IIS Express process is approx 500mb and it is taking 100% processor usage as well.
Are there any guidelines for generating workbooks in this manner?
At least in terms of memory utilization, it would help to have some comparison, maybe against Excel, in how much memory is utilized to simply have the resultant workbook opened. For instance, if you were to open the final report in both Excel and the "SpreadsheetGear 2012 for Windows" application (available in the SpreadsheetGear folder under the Start menu), what does the Task Manager measure for each of these applications in terms of memory consumption? This may provide some insight as to whether the memory utilization you are seeing in the actual report-building process is unusually high (is there a lot of extra overhead for your routine?), or just typical given the size of the workbook you are generating.
In terms of CPU utilization, this one is a bit more difficult to pinpoint and is certainly dependent on your hardware as well as implementation details in your code. Running a VS Profiler against your routine certainly would be interesting to look into, if you have this tool available to you. Generally speaking, the CPU time could potentially be broken up into a couple broad categories—CPU cycles used to "build" your workbook and CPU cycles to "calculate" it. It could be helpful to better determine which of these is dominating the CPU. One way to do this might be to, if possible, ensure that calculations don't occur until you are finished actually generating the workbook. In fact, avoiding any unnecessary calculations could potentially speed things up...it depends on the workbook, though. You could avoid calculations by setting IWorkbookSet.Calculation to Manual mode and not calling any of the IWorkbook’s "Calculate" methods (Calculate/CalculateFull/CalculateFullRebuild) until you are fished up with this process. If you don't have access to a Profiler too, maybe set some timers, Console.WriteLines and monitor the Task Manager to see how your CPU fluctuates during different parts of your routine. With any luck you might be able to better isolate what part of the routine is taking the most amount of time.

Detailed multitasking monitoring

I'm trying to put together a model of a computer and run some simulations on it (part of a school assignment). It's a very simple model - a CPU, a disk and a process generator that generates user processes that take turns in using the CPU and accessing the disk (I've decided to omit the various system processes, because according to Process Explorer they use next to no CPU time - I'm basing this on the Microsoft Process Explorer tool, running on Windows 7). And this is where I've stopped at.
I have no idea how to get relevant data on how often do various processes read/write to disk and how much data at once, and how much time they spend using the CPU. Let's say I want to get some statistics for some typical operations on a PC - playing music/movies, browsing the internet, playing games, working with Office, video editing and so on...is there even a way to gather such data?
I'm simulating preemptive multitasking using RR with a time quantum of 15ms for switching processes, and this is how it looks:
->Process gets to CPU
->Process does its work in 0-15ms, gives up the CPU or is cut off
And now, two options arise:
a)process just sits and waits before it gets the CPU again or before it gets some user input if there is nothing to do
b)process requested data from disk, and does not rejoin the queue until said data is available
And i would like the decision between a) and b) in the model be done based on a probability, for example 90% for a) and 10% for b). But I do not know how to get those percentages to be at least a bit realistic for a certain type of process. Also, how much data can and does a process typically access at once?
Any hints, sources, utilities available for this?
I think I found an answer myself, albeit an unreliable one.
The Process Explorer utility for Windows measures disk I/O - by volume and by occurences. So there's a rough way to get the answer:
say a process performs 3 000 reads in 30 minutes, whilst using 2% of CPU during that time (assuming a single core CPU). So the process has used 36000ms of CPU time, divided into ~5200 blocks (this is the unreliable part - the process in all proabbility does not use the whole of the time slot, so I'll just divide by half the time slot). 3000/5200 gives a 57% chance of reading data after using the CPU.
I hope I did not misunderstand the "reads" statistic in Process Explorer.

Windows to embedded port: data and code memory size

I am in the process of porting a windows 7 library to an embedded platform. In order to do so my employer asks me the amount of memory (and CPU but let us concentrate on the memory for now) that my system will need once ported - so he can size the board to my needs.
I had a look on the internet and there seem not be exist much information about this question, hence my questions:
in order to get a rough idea of the memory footprint of the code in flash memory (code only without memory for data), I read on the Internet that I should sum the size of all the dlls I use. It seems that all compilers and platforms give a different size for the code footprint but overall the size of the code (without data) is often very close. Do you confirm?
in order to deal with the memory required by the data only (heap + stack but no code), I had a look at the task manager (and process explorer). It seems the overall amount of data which I use is specified in the 'peak working set'. I have a few questions about it though:
2.a. Does the 'working set' include the heap + stack memory or does it correspond to the heap only?
2.b. Does the 'working set' include the size for the code as well? (as I am on windows 7, the code is also stored in RAM and not in flash as on embedded systems), or does it only correspond to the data?
2.c. it seems the 'peak working set' reflects the maximum amount of physical memory that was actually in RAM from the time the program was started, but it does not reflect the size the program could take afterwards (if I happen to allocate memory at runtime - which would be bad ;) - the peak value would go on increasing). Do you confirm?
2.d. Hence, do you also confirm that if I do not allocate memory at runtime, the 'peak working set' should roughly be the maximum size of RAM my embedded system will need? Up to a bit of size difference due to the difference in systems technology...
Thanks,
Antoine.
Unless you are intending to run your application on Windows Embedded, then looking at the code and data usage in Windows is not going to be much of an indicator of anything useful!
1) DLLs are libraries - not all the code within them will be utilised by your code. Most embedded systems are statically linked and the linker will link only modules that are actually referenced in your by your code. So taking the sum of the DLL dependencies is likley to lead to a gross over estimation of memory requirement.
2) Windows memory management is profligate with memory use - because it can be and to do so generally improves performance of typical desktop systems. For example, an thread stack in Windows is typically of the order to 2Mb - you may seldom use that much, but Windows gives it to you in any case because it can and to do so errs on the side of safety. A thread stack in an embedded system will typically range from a few tens of bytes to a few tens of kilobytes - it depends on your application.
Windows task manager shows what Windows allocates to your process, that may not relate to what your process needs. Also your application is using Windows services - all the memory used for kernel and device services will not show up as part of your process, but your embedded system may still need those.
If you do use your Windows prototype code to assess the embedded system requirements, then your best place to start is by getting the linker to generate a map file, which will give a detailed description of memory usage in terms of statically allocated data and code size.
Code size depends not only on the performance of the compiler, but also on the efficiency of the instruction set. Some architectures achieve higher code density than others. Windows application code size is never a good indicator of embedded code size because its execution environment is likley to be so much different. For example an pre-emptive multitasking RTOS kernel on a 32bit ARM can be implemented in less than 10Kb of code, a file system perhaps another 10, and network stack anything from 10 to 30K, USB another 10. As you can see this is a different world to desktop code.
Data memory usage is more easily determined perhaps; but you do that through analysis of your application rather than observing what Windows does. There is the data your application instantiates directly, and then there is data instantiated by libraries and device drivers you might call - in Windows the latter is likley to be relatively large and out of your control. Typical embedded systems libraries for things such a s network stacks, USB, file systems etc. are fall smaller and far more deterministic in both performance and size.
Your better bet is to describe your application in terms of its general purpose, performance requirements, real-time constraints, and its hardware requirements (display, networking, I/O, mass storage etc.), and then look at comparable solutions or at the libraries you will need to implement your solution; most embedded systems are "bare board" and do not have the services you find in Windows unless you write them or use third-party solutions - Windows is seldom a comparable solution to an embedded system.
If it is just a library rather than an application, then build it for a likley target using a Windows hosted GCC cross-compiler and see how big it ends up. You don't need hardware for that or even expend any money.

Strategy or tools to find "non-leak" memory usage problems in Delphi?

One old application started to consume memory a lot after server update. Memory usage seems to rise with out limit until program hangs.
According to FastMM4 and EurekaLog, there's no memory leak (except 28 bytes), so I assume all memory is freed when application is shutdown.
Are there any tools or strategies suitable for tracking this kind of memory problem?
Since September 2012, there is a very simple and comfortable way to find this type of "run-time only" memory leaks.
FastMM4991 introduced a new method, LogMemoryManagerStateToFile:
Added the LogMemoryManagerStateToFile call. This call logs a summary of
the memory manager state to file: The total allocated memory, overhead,
efficiency, and a breakdown of allocated memory by class and string type.
This call may be useful to catch objects that do not necessarily leak, but
do linger longer than they should.
To discover the leak at run time, you only need these steps
add a call to LogMemoryManagerStateToFile('memory.log', '') in a place where it will be called in intervals
run the application
open the log file with a tail program (for example BareTail), which will auto-refresh when the file content changes
watch the first lines of the file, they will contain the memory allocations which occupy the highest amount of memory
if you see a class or memory type constantly has a growing number of instances, this can be the reason of your leak
The growing memory consumption is an application issue. It is not a bug, which can discover FastMM4 or EurekaLog. As from they point of view - application just correctly uses the memory.
Using AQTime, MemProof (hard to find, D7 is last supported version (?)), SleuthQA (similar to MemProof) or similar memory profilers, you can track the memory usage outside of application in real-time.
Using FastMM4, GetMemoryManagerState / GetMemoryManagerUsageSummary you can track memory usage from application. Output this information into trace file and analyze it after run. Or make simple wrapping function for one of the above procedures, which will return curent memory usage. And call it from IDE Debugger Evalute / Modify, add to Watches or call OutputDebugString, and see the current memory usage.
Note, if memory is eated by some DLL then you may not see her memory usage using (3). Use (2).
Analyzing the memory usage and the tasks performed by the application, you may discover what leads to raised memory usage.
AQTime (a commercial tool which is quite expensive) can report your memory usage, down to the line of source code that allocated each object. In the case of very large memory usage scenarios, you might want the AQTime functionality that can show the number of objects and the size (total plus individual instance size) for each object. AQTime worked great for me, starting with Delphi 7, and all later versions, including your version (2006) and the latest versions (XE and XE2).
As the program memory usage grows, AQTime can be used to grab "snapshots" of the runtime heap, you can use to understand memory usage of your application; What is being created, and how many of each object exists. Even when no leaks exist, understanding the runtime behaviour of your application in terms of the objects it creates and manages, is very important, and AQTime is the most powerful tool I know of for Delphi users.
If you are willing to upgrade to Delphi XE/XE2, you might have an included light version of AQTime already, if so, check it out. If not, I recommend you try their demo. I am unaware of any free or open source alternatives that can provide the same functionality.
Lesser functionality could be cobbled together manually by writing lots of trace messages, or using the FastMM full-debug-mode. If you could write a complete dump of your memory usage into a very large file, you might be able to write some tools to parse, and create a summary. The problem I have with FastMM in this case, is that you will be drowned in detail information, without the ability to extract exactly the summary information that helps you understand your situation. So, you can try to write your own tool to summarize the memory usage. In one application I had that used a series of components that I knew would use a lot of memory, I wrote a dialog box into my application that showed current memory usage by these large memory-blob-of-data objects.
Have you ever think about the Leak that is causing the IDE... it is so huge!!!
In my case (2GB of RAM) i do the next...
1. Open the IDE
2. Leave it minimized for near six hours
3. See how Physical memory is getting used
The result:
While IDE is oppened (remember i also do the test having it minimized) it is getting more and more RAM... till no more ram free.
It gets all 2GB RAM + all Pagefile hard disk space (i have it configured to a mas of 4GB)
In less that six hours (doing nothing on IDE) it tries to use more than 6GB.
That is called a Memory Leak casused by the IDE... i do not type any letter on IDE, do not compile anything, do not even open any project... just open IDE and minimize it... leave the computer without doing anything on it for about six hours and IDE is consuming 6GB of memory.
Of course, after that, the IDE start with annoying messages of SystemOutOfMemory... and i must kill it... then all that 6GB are freed!!!
When on the hell will this get fixed?
Please note i have all patches applied, i also tested without applying each patch/hotfix, etc...
The best i got was dissabling some options on Tools, like the one that underlines bad code, etc... so why on the hell that option has any influence... i am not typing anything on the IDE (on the tests)... and if i have it dissabled the memory leak gets reduced a lot...
Of course, if i use the IDE (write code on an opened project) without even compiling / running it... the thing goes much more worst... memory leak upto 6GB can got reached on less than an hour, sometimes occurs after 15 minutes of Copy/Paste source code.
Seems there will not be a solution in a short time!!!
So i got the next solution that works perfect:
-Close the IDE an reopen it each 15 minutes or less
Ugly solution, i know... but works!!!

Resources