How to debug the error "bonmin.exe has stopped working" - memory

I am trying to solve a multiple integer non linear programming problem. I have about 178848 decision variables and they are all binary. I am creating the .nl file from pyomo and then trying to solve this .nl file through command prompt with the command: bonmin test.nl
Before I get the error in the title, I see a sudden jump in memory wherein I see the memory usage jumping upto 100% before the program exits. Is there any settings I can pass to bonmin to prevent this error from happening? Or is there any heuristic options available which I can pass to bonmin?

That is a rather large integer program. Solvers have limitations, so it may simply be that Bonmin/CBC cannot handle a problem of that size with the available memory on your system due to the branch and bound tree getting too large. If there are any reformulations or preprocessing possible to reduce the problem size, you may want to try that before sending to Bonmin. You can also try giving branching priority settings, though I am less familiar with how up implement that.

Related

Trouble reading memory

When I run my code through the debugger, after a series of steps it eventually gets lost and executes commands out of order. I'm not sure if the stack is overflowing or what.
This is the error I usually get:
MSP430: Trouble Reading Memory Block at 0xffe2e on Page 0 of Length 0x1d2: Invalid parameter(s)
Any suggestions on what it could be? I read briefly about possible issues with not handling some interrupts.
Also, I'm trying to fill my RAM with a specific value so that I can tell if the stack is overflowing, any suggestions on how to fill the entire RAM with, say a value of 0x1234?
Thanks!
What debugger and compiler are you using? I've found that msp430-gcc and msp430-gdb/gdbproxy can get very confused with GCC optimizations turned on. However, broken code is sometimes is emitted without them turned on (its a quality product, really).
The easiest way to fill memory is to modify you crt0.s startup file and link it yourself. When memory is set to 0, you can change the pattern there.
Which device are you using? On 16-bit devices, 0xffe2e is outside of the address space of the processor, likely an array index or similar which has gone negative.
I have seen this error as well when using code composer studio and TI's USBFET programmer although I have not been able to nail down a single, definite cause.
Assuming you are using CCS, here are some tips:
1) Catch ACCV (UNMI) and VMA (SYSNMI) interrupts and set a break point within the handlers. If one of these trips, examine the stack for clues as to what triggered the interrupt.
2) If you have any interrupt handlers which re-enable interrupts (GIE bit), make sure they are not being retriggered repeatedly.
3) I have seen this error (inexplicably) when stepping through optimized code; so it may help to turn off optimizations.
If you are using code composer studio, as an alternative to initializing your RAM, you can set a breakpoint on stack overflow. Also, with a paused debug session, CCS gives you the option to fill a portion of memory with any value you choose via the "Memory" sub-window.

What are possible causes of IDirect3DVertexBuffer9::Lock failing?

In error reports from some end users of our game I have quite often seen following behaviour: IDirect3DVertexBuffer9::Lock fails, returned error code is D3DERR_NOTAVAILABLE.
Once this happens, quite frequently (but not always) it is followed by the CreateTexture or CreateVertexBuffer call failing with error D3DERR_OUTOFVIDEOMEMORY.
What are possible reasons for a vertex buffer lock failure? Could the virtual memory address space be exhausted, or what?
Based on the DIRECTXDEV response by Chuck Walbourn from Microsoft, besides of "out of address space" another cause could be "out of page pool".
Alternatively, on Windows XP this could indicate you have hit the limits of paged pool kernel memory. Typically this happens when you create a lot of Direct3D resources (textures, etc.)
We DO create a lot of Direct3D resources.
This is what I posted to DirectXDev: ;)
Have you checked how much memory your
application is using? (Be sure to
select the Virtual Memory column in
Task Manager!). My guess would be
memory fragmentation based issues
causing you to, as you suggest, run
out of address space.
It could, however, be a driver bug ...
Does the debug runtime provide any useful information?
Edit: The only other thing I can think of is that the aperture memory has run out. I don't know how this works with PCIExpress but on AGP you can set the aperture size. I've no idea how to check if it is full however. I suspect the error you are seeing is reporting that its full. Are you doing lots of locks with the Discard flag? If so its possible that these are creating tonnes of new allocations in the aperture and is causing you to run out of memory there. This is pure guess work however.
I'd guess that if this is happening with only some of your users it is those on the lower end machines. If things run slowly then you can end up with a lot of data buffered in the command buffer. This will make control laggy and "could", at a guess, lead to the problem you are seeing. You may want to try making sure the command buffer never gets too long. If you make sure the first lock of every frame is done without the discard flag (ie flag set to 0) then this will cause the pipeline to stall until the vertex buffer has been rendered and bring the command buffer back in sync with you. This will cause a slow down as the command buffering will not be able to smooth out frame rate spikes as easily ...
Anyway ... thats just a guess!
The raised issue about out of memory is valid. We need some details on the Lock() call to be sure, but for example if it is in the DEFAULT pool and if it's dynamic (D3DLOCK_DISCARD flag passed), it's very well possible that your driver tries to find an unused piece of memory to return (because it double or triple buffers internally) and fails because, as you discover yourself soon after, video memory is exhausted.

memory not freed in matlab?

I am running a script that animates a plot (simulation of a water flow). After a while, I kill the loop by doing ctrl-c.
After doing this several times I get the error:
??? Error: Out of memory.
And after I start receiving that error, every call to my script will generate it.
Now, it happens before anything inside the function that I am calling is executed, i.e even if I add the line a=1 as the first line of the function I am calling, I still get the error and no printout, so the code inside the function doesn't even get executed.
What could be causing this?
There are several possible reasons.
Most likely your script creates some variables that are filling up the memory. Run
clear all
before restarting the script, so that all the variables are cleared, or change your script to a function (which will automatically erase all temporary variables after the function returns). Note that this also clears all loaded functions, so your next execution of the script has to load them again which will slow down the next execution by a (usually tiny) bit. It may be sufficient to call clear only.
Maybe you're animating by plotting several plots over one another (without clearing the axes first). Thus you might run out of Java heap space. You can close the open figures individually, or run
close all
You can also increase the amount of Java Memory Matlab uses on your system (see instructions here) - note that the limit is generally rather low, annoyingly so if you want to tons of figures.
Especially if you're running an older version of Windows, you may get your memory fragmented. Matlab needs contiguous blocks of free space to assign variables. To check for memory fragmentation, run
memory
and look at the number for the maximum possible variable size. If this is much smaller than the size available for all arrays, it's time to restart Matlab (I guess if you use a Windows version that would require a reboot to fix the problem, you may want to look into getting a new computer with Win7).
You can also try the pack command, eg:
close all;
clear all;
pack;
to clear memory. Although after a recent mathworks seminar I asked one of the mathworks guru's and he also conformed #Andrew Janke's comment regarding memory fragmentation. Usually quitting and restarting matlab sorts this out for me (on XP).
clear all close all are straight-forward ways to free memory, which are known by all non-beginners.
The main issue is that when you have done some data large data processing, and cleared/closed everything off - there is still significant memory used by matlab.
This is a currently major problem with matlab, and to my knowledge there is no solution rather than restarting matlab, which is a pity.
It sounds like you are not clearing any of your variables. You should either provide a way to stop the loop without hitting ctrl-c (write a simple GUI with a "Stop" button and your display) and then clean up your workspace in the script or clear your variables at the start of the script.
Are you intentionally storing all the data (or some large component) on each iteration of your loop?

How to get the root cause of a memory corruption in a embedded environment?

I have detected a memory corruption in my embedded environment (my program is running on a set top box with a proprietary OS ). but I couldn't get the root cause of it.
the memory corruption , itself, is detected after a stress test of launching and exiting an application multiple times. giving that I couldn't set a memory break point because the corruptued variable is changing it's address every time that the application is launched, is there any idea to catch the root cause of this corruption?
(A memory break point is break point launched when the environment change the value of a giving memory address)
note also that all my software is developed using C language.
Thanks for your help.
These are always difficult problems on embedded systems and there is no easy answer. Some tips:
Look at the value the memory gets corrupted with. This can give a clear hint.
Look at datastructures next to your memory corruption.
See if there is a pattern in the memory corruption. Is it always at a similar address?
See if you can set up the memory breakpoint at run-time.
Does the embedded system allow memory areas to be sandboxed? Set-up sandboxes to safeguard your data memory.
Good luck!
Where is the data stored and how is it accessed by the two processes involved?
If the structure was allocated off the heap, try allocating a much larger block and putting large guard areas before and after the structure. This should give you an idea of whether it is one of the surrounding heap allocations which has overrun into the same allocation as your structure. If you find that the memory surrounding your structure is untouched, and only the structure itself is corrupted then this indicates that the corruption is being caused by something which has some knowledge of your structure's location rather than a random memory stomp.
If the structure is in a data section, check your linker map output to determine what other data exists in the vicinity of your structure. Check whether those have also been corrupted, introduce guard areas, and check whether the problem follows the structure if you force it to move to a different location. Again this indicates whether the corruption is caused by something with knowledge of your structure's location.
You can also test this by switching data from the heap into a data section or visa versa.
If you find that the structure is no longer corrupted after moving it elsewhere or introducing guard areas, you should check the linker map or track the heap to determine what other data is in the vicinity, and check accesses to those areas for buffer overflows.
You may find, though, that the problem does follow the structure wherever it is located. If this is the case then audit all of the code surrounding references to the structure. Check the contents before and after every access.
To check whether the corruption is being caused by another process or interrupt handler, add hooks to each task switch and before and after each ISR is called. The hook should check whether the contents have been corrupted. If they have, you will be able to identify which process or ISR was responsible.
If the structure is ever read onto a local process stack, try increasing the process stack and check that no array overruns etc have occurred. Even if not read onto the stack, it's likely that you will have a pointer to it on the stack at some point. Check all sub-functions called in the vicinity for stack issues or similar that could result in the pointer being used erroneously by unrelated blocks of code.
Also consider whether the compiler or RTOS may be at fault. Try turning off compiler optimisation, and failing that inspect the code generated. Similarly consider whether it could be due to a faulty context switch in your proprietary RTOS.
Finally, if you are sharing the memory with another hardware device or CPU and you have data cache enabled, make sure you take care of this through using uncached accesses or similar strategies.
Yes these problems can be tough to track down with a debugger.
A few ideas:
Do regular code reviews (not fast at tracking down a specific bug, but valuable for catching such problems in general)
Comment-out or #if 0 out sections of code, then run the cut-down application. Try commenting-out different sections to try to narrow down in which section of the code the bug occurs.
If your architecture allows you to easily disable certain processes/tasks from running, by the process of elimination perhaps you can narrow down which process is causing the bug.
If your OS is a cooperative multitasking e.g. round robin (this would be too hard I think for preemptive multitasking): Add code to the end of the task that "owns" the structure, to save a "check" of the structure. That check could be a memcpy (if you have the time and space), or a CRC. Then after every other task runs, add some code to verify the structure compared to the saved check. This will detect any changes.
I'm assuming by your question you mean that you suspect some part of the proprietary code is causing the problem.
I have dealt with a similar issue in the past using what a colleague so tastefully calls a "suicide note". I would allocate a buffer capable of storing a number of copies of the structure that is being corrupted. I would use this buffer like a circular list, storing a copy of the current state of the structure at regular intervals. If corruption was detected, the "suicide note" would be dumped to a file or to serial output. This would give me a good picture of what was changed and how, and by increasing the logging frequency I was able to narrow down the corrupting action.
Depending on your OS, you may be able to react to detected corruption by looking at all running processes and seeing which ones are currently holding a semaphore (you are using some kind of access control mechanism with shared memory, right?). By taking snapshots of this data too, you perhaps can log the culprit grabbing the lock before corrupting your data. Along the same lines, try holding the lock to the shared memory region for an absurd length of time and see if the offending program complains. Sometimes they will give an error message that has important information that can help your investigation (for example, line numbers, function names, or code offsets for the offending program).
If you feel up to doing a little linker kung fu, you can most likely specify the address of any statically-allocated data with respect to the program's starting address. This might give you a consistent-enough memory address to set a memory breakpoint.
Unfortunately, this sort of problem is not easy to debug, especially if you don't have the source for one or more of the programs involved. If you can get enough information to understand just how your data is being corrupted, you may be able to adjust your structure to anticipate and expect the corruption (sometimes needed when working with code that doesn't fully comply with a specification or a standard).
You detect memory corruption. Could you be more specific how? Is it a crash with a core dump, for example?
Normally the OS will completely free all resources and handles your program has when the program exits, gracefully or otherwise. Even proprietary OSes manage to get this right, although its not a given.
So an intermittent problem could seem to be triggered after stress but just be chance, or could be in the initialisation of drivers or other processes the program communicates with, or could be bad error handling around say memory allocations that fail when the OS itself is under stress e.g. lazy tidying up of the closed programs.
Printfs in custom malloc/realloc/free proxy functions, or even an Electric Fence -style custom allocator might help if its as simple as a buffer overflow.
Use memory-allocation debugging tools like ElectricFence, dmalloc, etc - at minimum they can catch simple errors and most moderately-complex ones (overruns, underruns, even in some cases write (or read) after free), etc. My personal favorite is dmalloc.
A proprietary OS might limit your options a bit. One thing you might be able to do is run the problem code on a desktop machine (assuming you can stub out the hardware-specific code), and use the more-sophisticated tools available there (i.e. guardmalloc, electric fence).
The C library that you're using may include some routines for detecting heap corruption (glibc does, for instance). Turn those on, along with whatever tracing facilities you have, so you can see what was happening when the heap was corrupted.
First I am assuming you are on a baremetal chip that isn't running Linux or some other POSIX-capable OS (if you are there are much better techniques such as Valgrind and ASan).
Here's a couple tips for tracking down embedded memory corruption:
Use JTAG or similar to set a memory watchpoint on the area of memory that is being corrupted, you might be able to catch the moment when memory being is accidentally being written there vs a correct write, many JTAG debuggers include plugins for IDEs that allow you to get stack traces as well
In your hard fault handler try to generate a call stack that you can print so you can get a rough idea of where the code is crashing, note that since memory corruption can occur some time before the crash actually occurs the stack traces you get are unlikely to be helpful now but with better techniques mentioned below the stack traces will help, generating a backtrace on baremetal can be a very difficult task though, if you so happen to be using a Cortex-M line processor check this out https://github.com/armink/CmBacktrace or try searching the web for advice on generating a back/stack trace for your particular chip
If your compiler supports it use stack canaries to detect and immediately crash if something writes over the stack, for details search the web for "Stack Protector" for GCC or Clang
If you are running on a chip that has an MPU such as an ARM Cortex-M3 then you can use the MPU to write-protect the region of memory that is being corrupted or a small region of memory right before the region being corrupted, this will cause the chip to crash at the moment of the corruption rather than much later

How to Test for Memory Leaks?

We have an application with hundreds of possible user actions, and think about how enhancing memory leak testing.
Currently, here's the way it happens: When manually testing the software, if it appears that our application consumes too much memory, we use a memory tool, find the cause and fix it. It's a rather slow and not efficient process: the problems are discovered late and it relies on the good will of one developer.
How can we improve that?
Internally check that some actions (like "close file") do recover some memory and log it?
Assert on memory state inside our unit tests (but it seems this would be a tedious task) ?
Manually regularly check it from time to time?
Include that check each time a new user story is implemented?
Which language?
I'd use a tool such as Valgrind, try to fully exercise the program and see what it reports.
first line of defense:
check list with common memory
allocation related errors for
developers
coding guidelines
second line of defense:
code reviews
static code analyis (as a part of build process)
memory profiling tools
If you work with unmanaged language (like C/C++) you can efficiently discover most of the memory leaks by hijacking memory management functions. For example you can track all memory allocations/deallocations.
It seems to me that the core of the problem is not so much finding memory leaks as knowing when to test for them. You say you have lots of user actions, but you don't say what sequences of user actions are meaningful. If you can generate meaningful sequences at random, I'd argue hard for random testing. On random tests you would measure
Code coverage (with gcov or valgrind)
Memory usage (with valgrind)
Coverage of the user actions themselves
By "coverage of user actions" I mean statements like the following:
For every pair of actions A and B, if there is a meaningful sequence of actions in which A is immediately followed by B, then we have tested such a sequence.
If that's not true, then you can ask for what fraction of pairs A and B it is true.
If you have the CPU cycles to afford it, you would probably also benefit from running valgrind or another memory-checking tool either before every commit to your source-code repository or during a nightly build.
Automate!
In my company we have programmed an endless action path for our application. The java garbage collector should clean all unused maps and list and something like that. So we let the application start with the endless action path and look, whether the memory use size is growing.
The check which fields are not deleted you can use JProfiler for Java.
Replace new and delete with your custom versions and log every act of allocation/deallocation.
Speaking generally (not about testing, rather to fight the issue in its origin), smartpointers help to avoid this problem. Fortunately, C++11 standard provides new convenient smart pointer classes (shared_ptr, unique_ptr).

Resources