Can one process corrupt another process by IPC? - memory

From my research I understand that if two processes communicate through shared memory then if the shared segment gets damaged, both processes will most likely be affected.
What I would like to know is whether or not a damaged process has the ability to corrupt a healthy process memory just by passing a bad file descriptor or sending a corrupted message through IPC methods like unix sockets or dbus. In case it matters, I am asking about corruption due to programming errors and not purposeful exploitation.
I apologize if my question is too broad and I presume that the answer will be obvious to an experienced programmer, however this is something that has been bugging me for some time and it is very hard to find a satisfying answer on the web.

Definitely yes.
If the receiving process is assumes a certain memory layout and you corrupt it, it can easily behave wrong.

Related

Flow in Websphere Message Broker does not release resources after completion

For a project I have made several message flows in Websphere Message Broker 7.
One of these flows is quite a complicated flow with lots of database calls and transformations. However, it performs correctly and rather quickly given what it needs to do.
The problem is that while it is active, it consumes more and more resources, until the broker runs out of memory. Even if I use a small test case and it is able to complete before it crashes anything, the resources are not released. In this case, I can confirm the output of the flow (which is fine), but operations reported that it keeps consuming memory.
So, I guess a memory leak. I have no idea how and where to find it. Could anyone point me in a direction where to look?
If additional information is necessary, just ask. I would prefer not to put the entire compute node in this thread due to its size.
That you are having high memory consumption even after the processing is done makes me think that your message flow has some kind of state, that is stored in memory via shared or static variables.
You might be saving a lot of data in shared variables in ESQL, or static variables in Java in your flow.
Or if you are using JavaComputes, you might leak resources like ResultSets.
Or it could be some bug, you should check for known and fixed leaks in the fix packs issued for V7:
http://www-01.ibm.com/support/docview.wss?&uid=swg27019145
As stated in my comment above, a DataFlowEngine never releases its resources after completion.
This is the thread of IBM explaining the matter (bullet 8):
http://www-01.ibm.com/support/docview.wss?uid=swg21665926#8
Apart from that, the real issue seemed to be the use of Environment-variables inside a loop, which consumed a lot of memory. Deleting the variables after use is a good practice I can recommend.

How to do lua_pushstring and avoiding an out of memory setjmp exception

Sometimes, I want to use lua_pushstring in places after I allocated some resources which I would need to cleanup in case of failure. However, as the documentation seems to imply, lua_push* functions can always end up with an out of memory exception. But that exception instant-quits my C scope and doesn't allow me to cleanup whatever I might have temporarily allocated that might have to be freed in case of error.
Example code to illustrate the situation:
void* blubb = malloc(20);
...some other things happening here...
lua_pushstring(L, "test"); //how to do this call safely so I can still take care of blubb?
...possibly more things going on here...
free(blubb);
Is there a way I can check beforehand if such an exception would happen and then avoid pushing and doing my own error triggering as soon as I safely cleaned up my own resources? Or can I somehow simply deactivate the setjmp, and then check some "magic variable" after doing the push to see if it actually worked or triggered an error?
I considered pcall'ing my own function, but even just pushing the function on the stack I want to call safely through pcall can possibly give me an out of memory, can't it?
To clear things up, I am specifically asking this for combined use with custom memory allocators that will prevent Lua from allocating too much memory, so assume this is not a case where the whole system has run out of memory.
Unless you have registered a user-defined memory handler with Lua when you created your Lua state, getting an out of memory error means that your entire application has run out of memory. Recovery from this state is generally not possible. Or at least, not feasible in a lot of cases. It could be depending on your application, but probably not.
In short, if it ever comes up, you've got bigger things to be concerned about ;)
The only kind of cleanup that should affect you is for things external to your application. If you have some process global memory that you need to free or set some state in. You're doing interprocess communication and you have some memory mapped file you're talking though. Or something like that.
Otherwise, it's probably better to just kill your process.
You could build Lua as a C++ library. When you do that, errors become actual exceptions, which you can either catch or just use RAII objects to handle.
If you're stuck with C... well, there's not much you can do.
I am specifically interested in a custom allocator that will out of memory much earlier to avoid Lua eating too much memory.
Then you should handle it another way. To signal an out-of-memory error is basically to say, "I want Lua to terminate right now."
The way to stop Lua from eating memory is to periodically check the Lua state's memory, and garbage collect it if it's using too much. And if that doesn't free up enough memory, then you should terminate the Lua state manually, but only when it is safe to do so.
lua_atpanic() may be one solution for you, depending on the kind of cleanup you need to do. It will never throw an error.
In your specific example you could also create blubb as a userdata. Then Lua would free it automatically when it left the stack.
I have recently gotten into some more Lua sandboxing again, and now I think the answer I accepted previously is a bad idea. I have given this some more thought:
Why periodic checking is not enough
Periodically checking for large memory consumption and terminating Lua "only when it is safe to do so" seems like a bad idea if you consider that a single huge table can eat up a lot of your memory with one single VM instruction about which you'll only find out after it happened - where your program might already be dying from it, and then you indeed have much bigger problems which you could have avoided entirely if you had stopped that allocation in time in the first place.
Since Lua has a nice out of memory exception already built-in, I would just like to use that one since this allows me to do the minimal required thing (preventing the script from allocating more stuff, while possibly allowing it to recover) without my C code breaking from it.
Therefore my current plan for Lua sandboxing with memory limit is:
Use custom allocator that returns NULL with limit
Design all C functions to be able to handle this without memory leak or other breakage
But how to design the C functions safely?
How to do that, since lua_pushstring and others can always setjmp away with an error without me knowing whether that is gonna happen in advance? (this was originally my question)
I think I found a working approach:
I added a facility to register pointers when I allocate them, and where I unregister them after I am done with them. This means if Lua suddenly setjmp's me out of my C code without me getting a chance to clean up, I have everything in a global list I need to clean up that mess later when I'm back in control.
Is that ugly or what?
Yes, it is quite the hack. But, it will most likely work, and unlike 'periodic checking' it will actually allow me to have a true hard limit and avoid getting the application itself trouble because of an aggressive attack.

Can a Memcached daemon ever free() unused memory, without terminating the process?

I believe that you can't force a running Memcached instance to de-allocate memory, short of terminating that Memcached instance (and freeing all of the memory it held). Does anyone know of a definitive piece of documentation, or even a mailing list or blog posting from a reliable source, that can confirm or deny this impression?
As I understand it, a Memcached process initially allocates a chunk of memory (the exact initial allocation size is configurable), and then monotonically increases its memory utilization over its lifetime, limited by the daemon's maximum memory allocation size (also configurable). At no point does the Memcached daemon ever free any memory, regardless of whether the daemon has any ongoing need for the memory it holds.
I know that this question might sound a little whiny, with a tone of "I DEMAND that open source project X support my specific need!" That's not it, at all--I'm purely interested in the exact technical answer, here, and I swear I'm not harshing on Memcached. For the curious, this question came out of a discussion about possible methods for gracefully juggling multiple Memcached instances on a single server, given an application where the cost of a cache flush can be quite high.
However, I'd appreciate it if you save your application suggestions/advice for a different question (re-architecting my application, using a different caching implementation, etc.). I do appreciate a good brainstorm, but I think this question will be most valuable if it stays focused on the technical specifics of how Memcached does and does not work. If you don't have the answer to this specific question, there is probably still value in what you have to say, but I'd guess that there's a different, better place to post the more speculative comments/suggestions/advice.
This is probably the hardest problem we have to solve for memcached currently (well, a variation of it, anyway).
Freeing a chunk of memory requires us to know that a) nothing within the chunk is in use and b) nothing will start using it while we're in the process of purging it for reuse/freeing. I've heard some really good ideas for how we might solve our slab rebalancing problems which is basically the same, except we're not trying to free the memory, but to give it to something else (a common problem in a few large installations).
Also, whether free actually reduces the RSS of your process is implementation dependent. In many cases, a malloc/fill/free will leave the memory mapped in (unless your allocator uses mmap instead of sbrk).
I'm pretty sure this isn't possible with memcached. I don't see any technical reason why it couldn't be implemented though. Lock cache operations, expire enough keys to reach the desired size, update the size, unlock. (I'm sure there's nicer ways to avoid blocking the server during that time.)
The standard and default mechanism of memory management in memcached is slab allocator. It means that memory is being allocated for the process and never released to the operating system. Basically, when memory is no longer used to store some data, it is being held by the process in order to be reused later, when needed. However, the operating system releases memory allocated by the process when it is finished. That is why memory is being released when you kill/stop the memcached.
There is a compile-time option in memcached to enable malloc/free mechanism. So that when free() is called, memory might be released to operating system (this depends on C standard library implementation). But doing so might hurt a good fragmentation and performance.
Please read more about the issue here:
Why not use malloc/free
Memcached memory management

Does Erlang always copy messages between processes on the same node?

A faithful implementation of the actor message-passing semantics means that message contents are deep-copied from a logical point-of-view, even for immutable types. Deep-copying of message contents remains a bottleneck for implementations the actor model, so for performance some implementations support zero-copy message passing (although it's still deep-copy from the programmer's point-of-view).
Is zero-copy message-passing implemented at all in Erlang? Between nodes it obviously can't be implemented as such, but what about between processes on the same node? This question is related.
I don't think your assertion is correct at all - deep copying of inter-process messages isn't a bottleneck in Erlang, and with the default VM build/settings, this is exactly what all Erlang systems are doing.
Erlang process heaps are completely separate from each other, and the message queue is located in the process heap, so messages must be copied. This is also true for transferring data into and out of ETS tables as their data is stored in a separate allocation area from process heaps.
There are a number of shared datastructures however. Large binaries (>64 bytes long) are generally allocated in a node-wide area and are reference counted. Erlang processes just store references to these binaries. This means that if you create a large binary and send it to another process, you're only sending the reference.
Sending data between processes is actually worse in terms of allocation size than you might imagine - sharing inside a term isn't preserved during the copy. This means that if you carefully construct a term with sharing to reduce memory consumption, it will expand to its unshared size in the other process. You can see a practical example in the OTP Efficiency Guide.
As Nikolaus Gradwohl pointed out, there was an experimental hybrid heap mode for the VM which did allow term sharing between processes and enabled zero-copy message passing. It hasn't been a particularly promising experiment as I understand it - it requires extra locking and complicates the existing ability of processes to independently garbage collect. So not only is copying inter-process messages not the usual bottleneck in Erlang systems, allowing it actually reduced performance.
AFAIK there was/is experimental support for zero-copy message-passing in erlang using the -shared or -hybrid modell. I read a blog post in 2009 claiming that it's broken on smp machines, but I have no idea about the current status
As has been mentioned here and in other questions current versions of Erlang basically copy everything except for larger binaries. In older pre-SMP times it was feasible to not copy but pass references. While this resulted in very fast message passing it created other problems in the implementation, primarily it made garbage collection more difficult and complicated implementation. I think that today passing references and having shared data could result in excessive locking and synchronisation which is, of course, not a Good Thing.
I wrote the accepted answer to that other question you're referencing, and in it I give you a direct pointer to this line of code:
message = copy_struct(message, msize, &hp, &bp->off_heap);
This is in a function called when the Erlang run-time system needs to send a message, and it's not inside any kind of "if" that could cause it to be skipped. So, as far as I can tell, the answer is "yes, it's always copied." (That's not strictly true -- there is an "if", but it seems to be dealing with exceptional cases, not the normal code-flow path.)
(I'm ignoring the hybrid heap option brought up by Nikolaus. It looks like he's right, but since this isn't the way Erlang is normally built and it has its own penalties, I don't see that it's worth considering as a way to answer your concern.)
I don't know why you're considering 10 GByte/sec a bottleneck, though. Nothing short of registers or CPU cache goes faster in the computer, and such memories are small, thus constituting a kind of bottleneck themselves. Besides which, the zero-copy idea you're proposing would require locking in the case of cross-CPU message passing in a multi-core system, which is also a bottleneck. We're already paying the locking penalty once in this function to copy the message into the other process's message queue; why pay it again later when that process gets around to reading the message?
Bottom line, I don't think your ideas of ways to make it go faster would actually help much.

How to get the root cause of a memory corruption in a embedded environment?

I have detected a memory corruption in my embedded environment (my program is running on a set top box with a proprietary OS ). but I couldn't get the root cause of it.
the memory corruption , itself, is detected after a stress test of launching and exiting an application multiple times. giving that I couldn't set a memory break point because the corruptued variable is changing it's address every time that the application is launched, is there any idea to catch the root cause of this corruption?
(A memory break point is break point launched when the environment change the value of a giving memory address)
note also that all my software is developed using C language.
Thanks for your help.
These are always difficult problems on embedded systems and there is no easy answer. Some tips:
Look at the value the memory gets corrupted with. This can give a clear hint.
Look at datastructures next to your memory corruption.
See if there is a pattern in the memory corruption. Is it always at a similar address?
See if you can set up the memory breakpoint at run-time.
Does the embedded system allow memory areas to be sandboxed? Set-up sandboxes to safeguard your data memory.
Good luck!
Where is the data stored and how is it accessed by the two processes involved?
If the structure was allocated off the heap, try allocating a much larger block and putting large guard areas before and after the structure. This should give you an idea of whether it is one of the surrounding heap allocations which has overrun into the same allocation as your structure. If you find that the memory surrounding your structure is untouched, and only the structure itself is corrupted then this indicates that the corruption is being caused by something which has some knowledge of your structure's location rather than a random memory stomp.
If the structure is in a data section, check your linker map output to determine what other data exists in the vicinity of your structure. Check whether those have also been corrupted, introduce guard areas, and check whether the problem follows the structure if you force it to move to a different location. Again this indicates whether the corruption is caused by something with knowledge of your structure's location.
You can also test this by switching data from the heap into a data section or visa versa.
If you find that the structure is no longer corrupted after moving it elsewhere or introducing guard areas, you should check the linker map or track the heap to determine what other data is in the vicinity, and check accesses to those areas for buffer overflows.
You may find, though, that the problem does follow the structure wherever it is located. If this is the case then audit all of the code surrounding references to the structure. Check the contents before and after every access.
To check whether the corruption is being caused by another process or interrupt handler, add hooks to each task switch and before and after each ISR is called. The hook should check whether the contents have been corrupted. If they have, you will be able to identify which process or ISR was responsible.
If the structure is ever read onto a local process stack, try increasing the process stack and check that no array overruns etc have occurred. Even if not read onto the stack, it's likely that you will have a pointer to it on the stack at some point. Check all sub-functions called in the vicinity for stack issues or similar that could result in the pointer being used erroneously by unrelated blocks of code.
Also consider whether the compiler or RTOS may be at fault. Try turning off compiler optimisation, and failing that inspect the code generated. Similarly consider whether it could be due to a faulty context switch in your proprietary RTOS.
Finally, if you are sharing the memory with another hardware device or CPU and you have data cache enabled, make sure you take care of this through using uncached accesses or similar strategies.
Yes these problems can be tough to track down with a debugger.
A few ideas:
Do regular code reviews (not fast at tracking down a specific bug, but valuable for catching such problems in general)
Comment-out or #if 0 out sections of code, then run the cut-down application. Try commenting-out different sections to try to narrow down in which section of the code the bug occurs.
If your architecture allows you to easily disable certain processes/tasks from running, by the process of elimination perhaps you can narrow down which process is causing the bug.
If your OS is a cooperative multitasking e.g. round robin (this would be too hard I think for preemptive multitasking): Add code to the end of the task that "owns" the structure, to save a "check" of the structure. That check could be a memcpy (if you have the time and space), or a CRC. Then after every other task runs, add some code to verify the structure compared to the saved check. This will detect any changes.
I'm assuming by your question you mean that you suspect some part of the proprietary code is causing the problem.
I have dealt with a similar issue in the past using what a colleague so tastefully calls a "suicide note". I would allocate a buffer capable of storing a number of copies of the structure that is being corrupted. I would use this buffer like a circular list, storing a copy of the current state of the structure at regular intervals. If corruption was detected, the "suicide note" would be dumped to a file or to serial output. This would give me a good picture of what was changed and how, and by increasing the logging frequency I was able to narrow down the corrupting action.
Depending on your OS, you may be able to react to detected corruption by looking at all running processes and seeing which ones are currently holding a semaphore (you are using some kind of access control mechanism with shared memory, right?). By taking snapshots of this data too, you perhaps can log the culprit grabbing the lock before corrupting your data. Along the same lines, try holding the lock to the shared memory region for an absurd length of time and see if the offending program complains. Sometimes they will give an error message that has important information that can help your investigation (for example, line numbers, function names, or code offsets for the offending program).
If you feel up to doing a little linker kung fu, you can most likely specify the address of any statically-allocated data with respect to the program's starting address. This might give you a consistent-enough memory address to set a memory breakpoint.
Unfortunately, this sort of problem is not easy to debug, especially if you don't have the source for one or more of the programs involved. If you can get enough information to understand just how your data is being corrupted, you may be able to adjust your structure to anticipate and expect the corruption (sometimes needed when working with code that doesn't fully comply with a specification or a standard).
You detect memory corruption. Could you be more specific how? Is it a crash with a core dump, for example?
Normally the OS will completely free all resources and handles your program has when the program exits, gracefully or otherwise. Even proprietary OSes manage to get this right, although its not a given.
So an intermittent problem could seem to be triggered after stress but just be chance, or could be in the initialisation of drivers or other processes the program communicates with, or could be bad error handling around say memory allocations that fail when the OS itself is under stress e.g. lazy tidying up of the closed programs.
Printfs in custom malloc/realloc/free proxy functions, or even an Electric Fence -style custom allocator might help if its as simple as a buffer overflow.
Use memory-allocation debugging tools like ElectricFence, dmalloc, etc - at minimum they can catch simple errors and most moderately-complex ones (overruns, underruns, even in some cases write (or read) after free), etc. My personal favorite is dmalloc.
A proprietary OS might limit your options a bit. One thing you might be able to do is run the problem code on a desktop machine (assuming you can stub out the hardware-specific code), and use the more-sophisticated tools available there (i.e. guardmalloc, electric fence).
The C library that you're using may include some routines for detecting heap corruption (glibc does, for instance). Turn those on, along with whatever tracing facilities you have, so you can see what was happening when the heap was corrupted.
First I am assuming you are on a baremetal chip that isn't running Linux or some other POSIX-capable OS (if you are there are much better techniques such as Valgrind and ASan).
Here's a couple tips for tracking down embedded memory corruption:
Use JTAG or similar to set a memory watchpoint on the area of memory that is being corrupted, you might be able to catch the moment when memory being is accidentally being written there vs a correct write, many JTAG debuggers include plugins for IDEs that allow you to get stack traces as well
In your hard fault handler try to generate a call stack that you can print so you can get a rough idea of where the code is crashing, note that since memory corruption can occur some time before the crash actually occurs the stack traces you get are unlikely to be helpful now but with better techniques mentioned below the stack traces will help, generating a backtrace on baremetal can be a very difficult task though, if you so happen to be using a Cortex-M line processor check this out https://github.com/armink/CmBacktrace or try searching the web for advice on generating a back/stack trace for your particular chip
If your compiler supports it use stack canaries to detect and immediately crash if something writes over the stack, for details search the web for "Stack Protector" for GCC or Clang
If you are running on a chip that has an MPU such as an ARM Cortex-M3 then you can use the MPU to write-protect the region of memory that is being corrupted or a small region of memory right before the region being corrupted, this will cause the chip to crash at the moment of the corruption rather than much later

Resources