Handling segfault signal SIGSEGV need to determine the cause of segfault using siginfo_t

Handling segfault signal SIGSEGV need to determine the cause of segfault using siginfo_t - pthreads

I'm making a wrapper for the pthread library that allows each thread to have its own set of non-shared memory. Right now the way c is set up if any thread tries to rwe another threads data, the program segfaults. This is fine, I can catch it with a sighandler and call pthread_exit() and continue on with the program.
But not every segfault is going to be the result of a bad rwe. I need to find a way to use the siginfo type to determine if the segfault was bad programming or this error. Any ideas?
Since I am using mmap to manage the memory pages I think using si_addr in siginfo will help me out.

It sounds like what you're really after is thread local storage which is already solved much more portably than this. GCC provides __thread, MSVC provides __declspec(thread). boost::thread provides portable thread local storage using a variety of mechanisms depending on platform/toolchain etc.
If you really do want to go down this road it can be made to work however the path is fraught with dangers. Recovering from SIGSEGV is undefined behaviour technically, although it can be made to work on quite a few platforms it is neither robust nor portable. You need to be very careful what you do in the signal handler though too -- the list of async-safe functions, i.e. those which may legally be safely called from a signal handler is very small.
I've used this trick successfully a few times in the past, normally for marking "pages" as "dirty" in userspace. The way I did this was by setting up a hashtable which contained the base address of all the "pages" of memory that I was interested in. When you catch a SIGSEGV in a handler you can then map an address back to a page with simple arithmetic operations. Provided the hashtable can be read without locks you can then lookup if this is a page that you care about or a segfault from somewhere else and decide how to act.

Related

VtaskSuspenAll - Atomicity related ticket on FreeRTOS

I'm working with FreeRTOS kernel on a STM32F469 Target.
I have a hard fault and I suppose it's due to VtaskSuspendAll.
I've read this ticket : click here
How can I know if "writing from the register back into the memory is atomic" ? Because I understand that otherwhise it can be a problem and I guess my writings are not atomic.
The problem occurs when I'm using xEventGroupSetBitsFromISR() inside a timer interrupt...
I don't know how to investigate on this issue.

Start here https://www.freertos.org/FAQHelp.html which documents a whole load of common issues and how to catch them. There are some very specifics for STM32 in regards to setting up the number pre-emption bits in the hardware, etc.

According to the ARMv7M Architecture Reference Manual, Chapter "A3.5.3 Atomicity in the ARM architecture", reads and writes to bytes, halfwords and words are atomic. Read-Modify-Write Cycles are not atomic.
The mentioned ticket states:
Basically the key to this is that each task maintains its own context, and a context switch cannot occur if the variable is non zero. So, as long as the writing from the register back into the memory is atomic, it is not a problem.
Therefore, as long as uxSchedulerSuspended is byte/halfword/word, the access should work.
Use a JTAG debugger, put a breakpoint in the hardfault handler, and backtrace to where the fault happened. Examine the Registers CFSR, MMFAR and BFAR to learn more about what happened.

80386 Paging and Segmenation

I'm trying to understand a few things regarding paging and segmentation...
Firstly,
In order to implement protected mode within, Is segmentation
required? could it be implemented paging?
from what I understood every code segment has some privilege level, and code that runs within it can not preform instructions higher but that brings a lot of problems up in my opinion....
for example, what if some interrupt raises while executing code that belongs to a low privilege segment, the CPU would immediately move on and start executing instructions of some ISR handler, when would the CS be swapped? how would the CPU know the code executing within it is currently of a strict segment.
How are paging combined with segmentation specifically within the 80386 processors architecture?
I've read that within paging you also have page permissions as in R W E, and if accessing some address which is not at your permission some interrupt would be raised, so this brings, which makes segmentation issue kind off useless...
Within memory segmentation, when context switching into kernel code,
how could the CPU know that the code segment currently executing is
within the highest privilege level?
This kinda makes things difficult and seems to be not really useful...
What is 80386 actually using for memory management, and what is a flat memory model
I find a lot of problems with the memory segmentation method, for example
for example, if I'm writing some command that attempts fetching the value from the virtual memory address of 0x1234FFFF how would my processor know to which segment am I referring? perhaps I am trying to execute from 0x1234FFFF and perhaps I am trying to read from it.... how does it know when am I referring to the DS and when am I referring to the CS or the SS....?
Doesn't 80386 have a user/kernel-bit that Is turned on once the CPU
is executing within kernel-mode and off when within user-mode or
some similar mechanism to create protected execution?....
I honestly find this super confusing and annoying and have a huge headache from trying to understand this... Hopefully someone could explain this to me

How to do lua_pushstring and avoiding an out of memory setjmp exception

Sometimes, I want to use lua_pushstring in places after I allocated some resources which I would need to cleanup in case of failure. However, as the documentation seems to imply, lua_push* functions can always end up with an out of memory exception. But that exception instant-quits my C scope and doesn't allow me to cleanup whatever I might have temporarily allocated that might have to be freed in case of error.
Example code to illustrate the situation:
void* blubb = malloc(20);
...some other things happening here...
lua_pushstring(L, "test"); //how to do this call safely so I can still take care of blubb?
...possibly more things going on here...
free(blubb);
Is there a way I can check beforehand if such an exception would happen and then avoid pushing and doing my own error triggering as soon as I safely cleaned up my own resources? Or can I somehow simply deactivate the setjmp, and then check some "magic variable" after doing the push to see if it actually worked or triggered an error?
I considered pcall'ing my own function, but even just pushing the function on the stack I want to call safely through pcall can possibly give me an out of memory, can't it?
To clear things up, I am specifically asking this for combined use with custom memory allocators that will prevent Lua from allocating too much memory, so assume this is not a case where the whole system has run out of memory.

Unless you have registered a user-defined memory handler with Lua when you created your Lua state, getting an out of memory error means that your entire application has run out of memory. Recovery from this state is generally not possible. Or at least, not feasible in a lot of cases. It could be depending on your application, but probably not.
In short, if it ever comes up, you've got bigger things to be concerned about ;)
The only kind of cleanup that should affect you is for things external to your application. If you have some process global memory that you need to free or set some state in. You're doing interprocess communication and you have some memory mapped file you're talking though. Or something like that.
Otherwise, it's probably better to just kill your process.
You could build Lua as a C++ library. When you do that, errors become actual exceptions, which you can either catch or just use RAII objects to handle.
If you're stuck with C... well, there's not much you can do.
I am specifically interested in a custom allocator that will out of memory much earlier to avoid Lua eating too much memory.
Then you should handle it another way. To signal an out-of-memory error is basically to say, "I want Lua to terminate right now."
The way to stop Lua from eating memory is to periodically check the Lua state's memory, and garbage collect it if it's using too much. And if that doesn't free up enough memory, then you should terminate the Lua state manually, but only when it is safe to do so.

lua_atpanic() may be one solution for you, depending on the kind of cleanup you need to do. It will never throw an error.
In your specific example you could also create blubb as a userdata. Then Lua would free it automatically when it left the stack.

I have recently gotten into some more Lua sandboxing again, and now I think the answer I accepted previously is a bad idea. I have given this some more thought:
Why periodic checking is not enough
Periodically checking for large memory consumption and terminating Lua "only when it is safe to do so" seems like a bad idea if you consider that a single huge table can eat up a lot of your memory with one single VM instruction about which you'll only find out after it happened - where your program might already be dying from it, and then you indeed have much bigger problems which you could have avoided entirely if you had stopped that allocation in time in the first place.
Since Lua has a nice out of memory exception already built-in, I would just like to use that one since this allows me to do the minimal required thing (preventing the script from allocating more stuff, while possibly allowing it to recover) without my C code breaking from it.
Therefore my current plan for Lua sandboxing with memory limit is:
Use custom allocator that returns NULL with limit
Design all C functions to be able to handle this without memory leak or other breakage
But how to design the C functions safely?
How to do that, since lua_pushstring and others can always setjmp away with an error without me knowing whether that is gonna happen in advance? (this was originally my question)
I think I found a working approach:
I added a facility to register pointers when I allocate them, and where I unregister them after I am done with them. This means if Lua suddenly setjmp's me out of my C code without me getting a chance to clean up, I have everything in a global list I need to clean up that mess later when I'm back in control.
Is that ugly or what?
Yes, it is quite the hack. But, it will most likely work, and unlike 'periodic checking' it will actually allow me to have a true hard limit and avoid getting the application itself trouble because of an aggressive attack.

How to get the root cause of a memory corruption in a embedded environment?

I have detected a memory corruption in my embedded environment (my program is running on a set top box with a proprietary OS ). but I couldn't get the root cause of it.
the memory corruption , itself, is detected after a stress test of launching and exiting an application multiple times. giving that I couldn't set a memory break point because the corruptued variable is changing it's address every time that the application is launched, is there any idea to catch the root cause of this corruption?
(A memory break point is break point launched when the environment change the value of a giving memory address)
note also that all my software is developed using C language.
Thanks for your help.

These are always difficult problems on embedded systems and there is no easy answer. Some tips:
Look at the value the memory gets corrupted with. This can give a clear hint.
Look at datastructures next to your memory corruption.
See if there is a pattern in the memory corruption. Is it always at a similar address?
See if you can set up the memory breakpoint at run-time.
Does the embedded system allow memory areas to be sandboxed? Set-up sandboxes to safeguard your data memory.
Good luck!

Where is the data stored and how is it accessed by the two processes involved?
If the structure was allocated off the heap, try allocating a much larger block and putting large guard areas before and after the structure. This should give you an idea of whether it is one of the surrounding heap allocations which has overrun into the same allocation as your structure. If you find that the memory surrounding your structure is untouched, and only the structure itself is corrupted then this indicates that the corruption is being caused by something which has some knowledge of your structure's location rather than a random memory stomp.
If the structure is in a data section, check your linker map output to determine what other data exists in the vicinity of your structure. Check whether those have also been corrupted, introduce guard areas, and check whether the problem follows the structure if you force it to move to a different location. Again this indicates whether the corruption is caused by something with knowledge of your structure's location.
You can also test this by switching data from the heap into a data section or visa versa.
If you find that the structure is no longer corrupted after moving it elsewhere or introducing guard areas, you should check the linker map or track the heap to determine what other data is in the vicinity, and check accesses to those areas for buffer overflows.
You may find, though, that the problem does follow the structure wherever it is located. If this is the case then audit all of the code surrounding references to the structure. Check the contents before and after every access.
To check whether the corruption is being caused by another process or interrupt handler, add hooks to each task switch and before and after each ISR is called. The hook should check whether the contents have been corrupted. If they have, you will be able to identify which process or ISR was responsible.
If the structure is ever read onto a local process stack, try increasing the process stack and check that no array overruns etc have occurred. Even if not read onto the stack, it's likely that you will have a pointer to it on the stack at some point. Check all sub-functions called in the vicinity for stack issues or similar that could result in the pointer being used erroneously by unrelated blocks of code.
Also consider whether the compiler or RTOS may be at fault. Try turning off compiler optimisation, and failing that inspect the code generated. Similarly consider whether it could be due to a faulty context switch in your proprietary RTOS.
Finally, if you are sharing the memory with another hardware device or CPU and you have data cache enabled, make sure you take care of this through using uncached accesses or similar strategies.

Yes these problems can be tough to track down with a debugger.
A few ideas:
Do regular code reviews (not fast at tracking down a specific bug, but valuable for catching such problems in general)
Comment-out or #if 0 out sections of code, then run the cut-down application. Try commenting-out different sections to try to narrow down in which section of the code the bug occurs.
If your architecture allows you to easily disable certain processes/tasks from running, by the process of elimination perhaps you can narrow down which process is causing the bug.
If your OS is a cooperative multitasking e.g. round robin (this would be too hard I think for preemptive multitasking): Add code to the end of the task that "owns" the structure, to save a "check" of the structure. That check could be a memcpy (if you have the time and space), or a CRC. Then after every other task runs, add some code to verify the structure compared to the saved check. This will detect any changes.

I'm assuming by your question you mean that you suspect some part of the proprietary code is causing the problem.
I have dealt with a similar issue in the past using what a colleague so tastefully calls a "suicide note". I would allocate a buffer capable of storing a number of copies of the structure that is being corrupted. I would use this buffer like a circular list, storing a copy of the current state of the structure at regular intervals. If corruption was detected, the "suicide note" would be dumped to a file or to serial output. This would give me a good picture of what was changed and how, and by increasing the logging frequency I was able to narrow down the corrupting action.
Depending on your OS, you may be able to react to detected corruption by looking at all running processes and seeing which ones are currently holding a semaphore (you are using some kind of access control mechanism with shared memory, right?). By taking snapshots of this data too, you perhaps can log the culprit grabbing the lock before corrupting your data. Along the same lines, try holding the lock to the shared memory region for an absurd length of time and see if the offending program complains. Sometimes they will give an error message that has important information that can help your investigation (for example, line numbers, function names, or code offsets for the offending program).
If you feel up to doing a little linker kung fu, you can most likely specify the address of any statically-allocated data with respect to the program's starting address. This might give you a consistent-enough memory address to set a memory breakpoint.
Unfortunately, this sort of problem is not easy to debug, especially if you don't have the source for one or more of the programs involved. If you can get enough information to understand just how your data is being corrupted, you may be able to adjust your structure to anticipate and expect the corruption (sometimes needed when working with code that doesn't fully comply with a specification or a standard).

You detect memory corruption. Could you be more specific how? Is it a crash with a core dump, for example?
Normally the OS will completely free all resources and handles your program has when the program exits, gracefully or otherwise. Even proprietary OSes manage to get this right, although its not a given.
So an intermittent problem could seem to be triggered after stress but just be chance, or could be in the initialisation of drivers or other processes the program communicates with, or could be bad error handling around say memory allocations that fail when the OS itself is under stress e.g. lazy tidying up of the closed programs.
Printfs in custom malloc/realloc/free proxy functions, or even an Electric Fence -style custom allocator might help if its as simple as a buffer overflow.

Use memory-allocation debugging tools like ElectricFence, dmalloc, etc - at minimum they can catch simple errors and most moderately-complex ones (overruns, underruns, even in some cases write (or read) after free), etc. My personal favorite is dmalloc.

A proprietary OS might limit your options a bit. One thing you might be able to do is run the problem code on a desktop machine (assuming you can stub out the hardware-specific code), and use the more-sophisticated tools available there (i.e. guardmalloc, electric fence).
The C library that you're using may include some routines for detecting heap corruption (glibc does, for instance). Turn those on, along with whatever tracing facilities you have, so you can see what was happening when the heap was corrupted.

First I am assuming you are on a baremetal chip that isn't running Linux or some other POSIX-capable OS (if you are there are much better techniques such as Valgrind and ASan).
Here's a couple tips for tracking down embedded memory corruption:
Use JTAG or similar to set a memory watchpoint on the area of memory that is being corrupted, you might be able to catch the moment when memory being is accidentally being written there vs a correct write, many JTAG debuggers include plugins for IDEs that allow you to get stack traces as well
In your hard fault handler try to generate a call stack that you can print so you can get a rough idea of where the code is crashing, note that since memory corruption can occur some time before the crash actually occurs the stack traces you get are unlikely to be helpful now but with better techniques mentioned below the stack traces will help, generating a backtrace on baremetal can be a very difficult task though, if you so happen to be using a Cortex-M line processor check this out https://github.com/armink/CmBacktrace or try searching the web for advice on generating a back/stack trace for your particular chip
If your compiler supports it use stack canaries to detect and immediately crash if something writes over the stack, for details search the web for "Stack Protector" for GCC or Clang
If you are running on a chip that has an MPU such as an ARM Cortex-M3 then you can use the MPU to write-protect the region of memory that is being corrupted or a small region of memory right before the region being corrupted, this will cause the chip to crash at the moment of the corruption rather than much later

How to log mallocs

This is a bit hypothetical and grossly simplified but...
Assume a program that will be calling functions written by third parties. These parties can be assumed to be non-hostile but can't be assumed to be "competent". Each function will take some arguments, have side effects and return a value. They have no state while they are not running.
The objective is to ensure they can't cause memory leaks by logging all mallocs (and the like) and then freeing everything after the function exits.
Is this possible? Is this practical?
p.s. The important part to me is ensuring that no allocations persist so ways to remove memory leaks without doing that are not useful to me.

You don't specify the operating system or environment, this answer assumes Linux, glibc, and C.
You can set __malloc_hook, __free_hook, and __realloc_hook to point to functions which will be called from malloc(), realloc(), and free() respectively. There is a __malloc_hook manpage showing the prototypes. You can add track allocations in these hooks, then return to let glibc handle the memory allocation/deallocation.
It sounds like you want to free any live allocations when the third-party function returns. There are ways to have gcc automatically insert calls at every function entrance and exit using -finstrument-functions, but I think that would be inelegant for what you are trying to do. Can you have your own code call a function in your memory-tracking library after calling one of these third-party functions? You could then check if there are any allocations which the third-party function did not already free.

First, you have to provide the entrypoints for malloc() and free() and friends. Because this code is compiled already (right?) you can't depend on #define to redirect.
Then you can implement these in the obvious way and log that they came from a certain module by linking those routines to those modules.
The fastest way involves no logging at all. If the amount of memory they use is bounded, why not pre-allocate all the "heap" they'll ever need and write an allocator out of that? Then when it's done, free the entire "heap" and you're done! You could extend this idea to multiple heaps if it's more complex that that.
If you really do need to "log" and not make your own allocator, here's some ideas. One, use a hash table with pointers and internal chaining. Another would be to allocate extra space in front of every block and put your own structure there containing, say, an index into your "log table," then keep a free-list of log table entries (as a stack so getting a free one or putting a free one back is O(1)). This takes more memory but should be fast.
Is it practical? I think it is, so long as the speed-hit is acceptable.

You could run the third party functions in a separate process and close the process when you are done using the library.

A better solution than attempting to log mallocs might be to sandbox the functions when you call them—give them access to a fixed segment of memory and then free that segment when the function is done running.
Unconfined, incompetent memory usage can be just as damaging as malicious code.

Can't you just force them to allocate all their memory on the stack? This way it would be garanteed to be freed after the function exits.

In the past I wrote a software library in C that had a memory management subsystem that contained the ability to log allocations and frees, and to manually match each allocation and free. This was of some use when attempting to find memory leaks, but it was difficult and time consuming to use. The number of logs was overwhelming, and it took an extensive amount of time to understand the logs.
That being said, if your third party library has extensive allocations, its more then likely impractical to track this via logging. If you're running in a Windows environment, I would suggest using a tool such as Purify[1] or BoundsChecker[2] that should be able to detect leaks in your third party libraries. The investment in the tool should pay for itself in time saved.
[1]: http://www-01.ibm.com/software/awdtools/purify/ Purify
[2]: http://www.compuware.com/products/devpartner/visualc.htm BoundsChecker

Since you're worried about memory leaks and talking about malloc/free, I assume you're in C. I'm also assuming based on your question that you do not have access to the source code of the third party library.
The only thing I can think of is to examine memory consumption of your app before & after the call, log error messages if they're different and convince the third party vendor to fix any leaks you find.

If you have money to spare, then consider using Purify to track issues. It works wonders, and does not require source code or recompilation. There are also other debugging malloc libraries available that are cheaper. Electric Fence is one name I recall. That said, the debugging hooks mentioned by Denton Gentry seem interesting too.

If you're too poor for Purify, try Valgrind. It it a lot better than it was 6 years ago and a lot easier to dive into than Purify.

Microsoft Windows provides (use SUA if you need a POSIX), quite possibly, the most advanced heap+(other api known to use the heap) infrastructure of any shipping OS today.
the __malloc() debug hooks and the associated CRT debug interfaces are nice for cases where you have the source code to the tests, however they can often miss allocations by standard libraries or other code which is linked. This is expected as they are the Visual Studio heap debugging infrastructure.
gflags is a very comprehensive and detailed set of debuging capabilities which has been included with Windows for many years. Having advanced functionality for source and binary only use cases (as it is the OS heap debugging infrastructure).
It can log full stack traces (repaginating symbolic information in a post-process operation), of all heap users, for all heap modifying entrypoint's, serially if needed. Also, it may modify the heap with pathalogical cases which may align the allocation of data such that the page protection offered by the VM system is optimally assigned (i.e. allocate your requested heap block at the end of a page, so even a singele byte overflow is detected at the time of the overflow.
umdh is a tool which can help assess the status at various checkpoints, however the data is continually accumulated during the execution of the target o it is not a simple checkpointing debug stop in the traditional context. Also, WARNING, Last I checked at least, the total size of the circular buffer which store's the stack information, for each request is somewhat small (64k entries (entries+stack)), so you may need to dump rapidly for heavy heap users. There are other ways to access this data but umdh is fairly simple.
NOTE there are 2 modes;
MODE 1, umdh {-p:Process-id|-pn:ProcessName} [-f:Filename] [-g]
MODE 2, umdh [-d] {File1} [File2] [-f:Filename]
I do not know what insanity gripped the developer who chose to alternate between -p:foo argument specifier's and naked ordering of argument's but it can get a little confusing.
The debugging sdk works with a number of other tools, memsnap is a tool which apparently focuses on memory leask and such, but I have not used it, your milage may vary.
Execute gflags with no arguments for the UI mode, +arg's and /args are different "modes" of use also.

On Linux I've successfully used mtrace(3) to log allocations and freeings. Its usage is as simple as
Modify your program to call mtrace() when you need to begin tracing (e.g. at the top of main()),
Set environment variable MALLOC_TRACE to the file path where the trace should be saved and run the program.
After that the output file will contain something like this (excerpt from the middle to show a failed allocation):
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x99e5e20 0x49
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x99beba0
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x9a23ec0 0x10
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x9a23ec0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + 0x99c67c0 0x8
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668f14f] - 0x99c67c0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + (nil) 0x30000000
# /lib/libc.so.6:[0xf677f8eb] + 0x99c21f0 0x158
# /lib/libc.so.6:(_IO_file_doallocate+0x91)[0xf677ee61] + 0xbfb00480 0x400
# /lib/libc.so.6:(_IO_setb+0x59)[0xf678d7f9] - 0xbfb00480

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart