Have a system build using C++ Builder 2010 that after running for about 20 hours it starts firing of assertion failures.
Assertion failed: xdrPtr && xdrPtr == *xdrLPP, file xx.cpp, line 2349
Tried google on it like crazy but not much info. Some people seem to refer a bunch of different assertions in xx.cpp to shortcomings in the exception handling in C++ Builder. But I haven't found anything referencing this particular line in the file.
We have integrated madExcept and it seems like somewhere along the way this catches an out of memory exception, but not sure if it's connected. No matter what an assertion triggering doesn't seem correct.
Edit:
I found an instance of a if-statement that as part of it's statement used a function that could throw an exception. I wonder if this could be the culprit somehow messing up the flow of the exception handling or something?
Consider
if(foo() == 0) {
...
}
wrapped in a try catch block.
If an exception is thrown from within foo() so that no int is returned here how will the if statement react? I'm thinking it still might try to finish executing that line and this performing the if check on the return of the function which will barf since no int was returned. Is this well defined or is this undefined behaviour?
Wouldn't
int fooStatus = foo();
if(fooStatus == 0) {
...
}
be better (or should I say safer)?
Edit 2:
I just managed to get the assertion on my dev machine (the application just standing idle) without any exception about memory popping up and the app only consuming around 100 mb. So they were probably not connected.
Will try to see if I can catch it again and see around where it barfs.
Edit 3:
Managed to catch it. First comes an assertion failure notice like explained. Then the debugger shows me this exception notification.
If I break it takes me here in the code
It actually highlights the first code line after
pConnection->Open();
But it seems I can change this to anything and that line is still highlighted. So my guess is that the error is in the code above it somehow. I have seen more reports about people getting this type of assertion failure when working with databases in RAD Studio... hmmmm.
Update:
I found a thread that recursively called it's own Execute function if it wasn't able to reach the DB server. I think this is at least part of the issue. This will just keep on trying and as more and more worker threads spawn and also keep trying it can only end in disaster.
If madExcept is hinting that you have an out of memory condition, the assert could fail if the pointers are NULL (i.e. the allocation failed). What are the values of xdrPtr and xdrLPP when the assert occurs? Can you trace back to where they are allocated?
I would start looking for memory leaks.
Related
To elaborate, I am currently writing a program that requires a function that is provided by the professor. When I run the program, I get a segmentation fault, and the debugger I use (gdb) says that the segmentation fault occurred at the definition of the function that, like I said, was provided by the professor.
So my question here is, is the definition itself causing the fault, or is it somewhere else in the program that called the function causing the fault?
I attempted to find a spot in the program that might have been leading to it, such as areas that might have incorrect parameters. I have not changed the function itself, as it is not supposed to be modified (as per instructions). This is my first time posting a question, so if there is any other information needed, please let me know.
The error thrown is as follows:
Program received signal SIGSEGV, Segmentation fault. .0x00401450 in Parser::GetNextToken (in=..., line=#0x63fef0: 1) at PA2.cpp:20 20 return GetNextToken(in, line);
The code itself that this is happening at is this:
static LexItem GetNextToken(istream& in, int& line) {
if( pushed_back ) {
pushed_back = false;
return pushed_token;
}
return GetNextToken(in, line);
}
Making many assumptions here, but maybe the lesson is to understand how the stack is affected by a function call and parameters. Create a main() function, that call the professor's provided function and trace the code using dbg, looking at the stack.
I've hit an assertion in code and wondering if there's a way to create a wrapper around the assert that would enable breaking out and continuing execution or some other function that would enable a way to suppress the assert through the lldb debugger.
assert(condition(), makeCriticalEvent().event.description, file: fileID, line: UInt(line))
This is the standard assertion in apples libraries here. When I hit this assertion I tried continuing execution but it stays stuck at the assertion. I'd like to silence the assertion (likely through the lldb debugger by typing some command). Anyone have an idea how to do this?
You have to do two things. First, you have to tell lldb to suppress the SIGABRT signal that the assert delivers. Do this by running:
(lldb) process handle SIGABRT -p 0
in lldb. Normally SIGABRT is not maskable, so I was a little surprised this worked. Maybe because this is a SIGABRT the process sends itself? I don't think there's any guarantee suppressing SIGABRT's has to work in the debugger so YMMV, but it seems to currently. Anyway, do this when you've hit the assert.
Then you need to forcibly unwind the assert part of the stack. You can do that using thread return, which returns to the thread above the currently selected one w/o executing the code in that frame or any of the others below it. So just select the frame that caused the assert, go down one frame on the stacks and do thread return.
Now when you continue you won't hit the abort and you'll be back in your code.
I'm new to lldb and trying to diagnose an error by using po [$eax class]
The error shown in the UI is:
Thread 1: EXC_BREAKPOINT (code=EXC_i386_BPT, subcode=0x0)
Here is the lldb console including what I entered and what was returned:
(lldb) po [$eax class]
error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=1, address=0xb06b9940).
The process has been returned to the state before expression evaluation.
The global breakpoint state toggle is off.
You app is getting stopped because the code you are running threw an uncaught Mach exception. Mach exceptions are the equivalent of BSD Signals for the Mach kernel - which makes up the lowest levels of the macOS operating system.
In this case, the particular Mach exception is EXC_BREAKPOINT. EXC_BREAKPOINT is a common source of confusion... Because it has the word "breakpoint" in the name people think that it is a debugger breakpoint. That's not entirely wrong, but the exception is used more generally than that.
EXC_BREAKPOINT is in fact the exception that the lower layers of Mach reports when it executes a certain instruction (a trap instruction). That trap instruction is used by lldb to implement breakpoints, but it is also used as an alternative to assert in various bits of system software. For instance, swift uses this error if you access past the end of an array. It is a way to stop your program right at the point of the error. If you are running outside the debugger, this will lead to a crash. But if you are running in the debugger, then control will be returned to the debugger with this EXC_BREAKPOINT stop reason.
To avoid confusion, lldb will never show you EXC_BREAKPOINT as the stop reason if the trap was one that lldb inserted in the program you are debugging to implement a debugger breakpoint. It will always say breakpoint n.n instead.
So if you see a thread stopped with EXC_BREAKPOINT as its stop reason, that means you've hit some kind of fatal error, usually in some system library used by your program. A backtrace at this point will show you what component is raising that error.
Anyway, then having hit that error, you tried to figure out the class of the value in the eax register by calling the class method on it by running po [$eax class]. Calling that method (which will cause code to get run in the program you are debugging) lead to a crash. That's what the "error" message you cite was telling you.
That's almost surely because $eax doesn't point to a valid ObjC object, so you're just calling a method on some random value, and that's crashing.
Note, if you are debugging a 64 bit program, then $eax is actually the lower 32 bits of the real argument passing register - $rax. The bottom 32 bits of a 64 bit pointer is unlikely to be a valid pointer value, so it is not at all surprising that calling class on it led to a crash.
If you were trying to call class on the first passed argument (self in ObjC methods) on 64 bit Intel, you really wanted to do:
(lldb) po [$rax class]
Note, that was also unlikely to work, since $rax only holds self at the start of the function. Then it gets used as a scratch register. So if you are any ways into the function (which the fact that your code fatally failed some test makes seem likely) $rax would be unlikely to still hold self.
Note also, if this is a 32 bit program, then $eax is not in fact used for argument passing - 32 bit Intel code passes arguments on the stack, not in registers.
Anyway, the first thing to do to figure out what went wrong was to print the backtrace when you get this exception, and see what code was getting run at the time this error occurred.
Clean project and restart Xcode worked for me.
I'm adding my solution, as I've struggled with the same problem and I didn't find this solution anywhere.
In my case I had to run Product -> Clean Build Folder (Clean + Option key) and rebuild my project. Breakpoints and lldb commands started to work properly.
In my first few dummy apps(for practice while learning) I have come across a lot of EXC_BAD_ACCESS, that somehow taught me Bad-Access is : You are touching/Accessing a object that you shouldn't because either it is not allocated yet or deallocated or simply you are not authorized to access it.
Look at this sample code that has bad-access issue because I am trying to modify a const :
-(void)myStartMethod{
NSString *str = #"testing";
const char *charStr = [str UTF8String];
charStr[4] = '\0'; // bad access on this line.
NSLog(#"%s",charStr);
}
While Segmentation fault says : Segmentation fault is a specific kind of error caused by accessing memory that “does not belong to you.” It’s a helper mechanism that keeps you from corrupting the memory and introducing hard-to-debug memory bugs. Whenever you get a segfault you know you are doing something wrong with memory (more description here.
I wanna know two things.
One, Am I right about objective-C's EXC_BAD_ACCESS ? Do I get it right ?
Second, Are EXC_BAD_ACCESS and Segmentation fault same things and Apple has just improvised its name?
No, EXC_BAD_ACCESS is not the same as SIGSEGV.
EXC_BAD_ACCESS is a Mach exception (A combination of Mach and xnu compose the Mac OS X kernel), while SIGSEGV is a POSIX signal. When crashes occur with cause given as EXC_BAD_ACCESS, often the signal is reported in parentheses immediately after: For instance, EXC_BAD_ACCESS(SIGSEGV). However, there is one other POSIX signal that can be seen in conjunction with EXC_BAD_ACCESS: It is SIGBUS, reported as EXC_BAD_ACCESS(SIGBUS).
SIGSEGV is most often seen when reading from/writing to an address that is not at all mapped in the memory map, like the NULL pointer, or attempting to write to a read-only memory location (as in your example above). SIGBUS on the other hand can be seen even for addresses the process has legitimate access to. For instance, SIGBUS can smite a process that dares to load/store from/to an unaligned memory address with instructions that assume an aligned address, or a process that attempts to write to a page for which it has not the privilege level to do so.
Thus EXC_BAD_ACCESS can best be understood as the set of both SIGSEGV and SIGBUS, and refers to all ways of incorrectly accessing memory (whether because said memory does not exist, or does exist but is misaligned, privileged or whatnot), hence its name: exception – bad access.
To feast your eyes, here is the code, within the xnu-1504.15.3 (Mac OS X 10.6.8 build 10K459) kernel source code, file bsd/uxkern/ux_exception.c beginning at line 429, that translates EXC_BAD_ACCESS to either SIGSEGV or SIGBUS.
/*
* ux_exception translates a mach exception, code and subcode to
* a signal and u.u_code. Calls machine_exception (machine dependent)
* to attempt translation first.
*/
static
void ux_exception(
int exception,
mach_exception_code_t code,
mach_exception_subcode_t subcode,
int *ux_signal,
mach_exception_code_t *ux_code)
{
/*
* Try machine-dependent translation first.
*/
if (machine_exception(exception, code, subcode, ux_signal, ux_code))
return;
switch(exception) {
case EXC_BAD_ACCESS:
if (code == KERN_INVALID_ADDRESS)
*ux_signal = SIGSEGV;
else
*ux_signal = SIGBUS;
break;
case EXC_BAD_INSTRUCTION:
*ux_signal = SIGILL;
break;
...
Edit in relation to another of your questions
Please note that exception here does not refer to an exception at the level of the language, of the type one may catch with syntactical sugar like try{} catch{} blocks. Exception here refers to the actions of a CPU on encountering certain types of mistakes in your program (they may or may not be be fatal), like a null-pointer dereference, that require outside intervention.
When this happens, the CPU is said to raise what is commonly called either an exception or an interrupt. This means that the CPU saves what it was doing (the context) and deals with the exceptional situation.
To deal with such an exceptional situation, the CPU does not start executing any "exception-handling" code (catch-blocks or suchlike) in your application. It first gives the OS control, by starting to execute a kernel-provided piece of code called an interrupt service routine. This is a piece of code that figures out what happened to which process, and what to do about it. The OS thus has an opportunity to judge the situation, and take the action it wants.
The action it does for an invalid memory access (such as a null pointer dereference) is to signal the guilty process with EXC_BAD_ACCESS(SIGSEGV). The action it does for a misaligned memory access is to signal the guilty process with EXC_BAD_ACCESS(SIGBUS). There are many other exceptional situations and corresponding actions, not all of which involve signals.
We're now back in the context of your program. If your program receives the SIGSEGV or SIGBUS signals, it will invoke the signal handler that was installed for that signal, or the default one if none was. It is rare for people to install custom handlers for SIGSEGV and SIGBUS and the default handlers shut down your program, so what you usually get is your program being shut down.
This sort of exceptions is therefore completely unlike the sort one throws in try{}-blocks and catch{}es. Those exceptions are handled purely within the application, without involving the OS at all. Here what happens is that a throw statement is simply a glorified jump to the inner-most catch block that handles that exception. As the exception bubbles through the stack, it unwinds the stack behind it, running destructors and suchlike as needed.
Basically yes, indeed an EXC_BAD_ACCESS is usually paired with a SIGSEGV which is a signal that warns about the segmentation failure.
A segmentation failure is risen whenever you are working with a pointer that points to invalid data (maybe not belonging to the process, maybe read-only, maybe an invalid address in general).
Don't think about the segmentation fault in terms of "accessing an object", you are accessing a memory location, so an address. That address must be considered coherent by the OS memory protection system.
Not all errors which are related to accessing invalid data can be tracked by the memory manager, think about a pointer to a stack allocated variable, which is considered valid although its content is not valid anymore upon restoring the stack frame.
Assuming a thread successfully calls pthread_mutex_lock, is it still possible that a call to pthread_mutex_unlock in that same thread will fail? If so, can you actually do something about it besides abort the thread?
if(pthread_mutex_lock(&m) == 0)
{
// got the lock, let's do some work
if(pthread_mutex_unlock(&m) != 0) // can this really fail?
{
// ok, we have a lock but can't unlock it?
}
}
From this page, possible errors for pthread_mutex_unlock() are:
[EINVAL]
The value specified by mutex does not refer to an initialised
mutex object.
If the lock succeeded then this is unlikely to fail.
[EAGAIN]
The mutex could not be acquired because the maximum number of
recursive locks for mutex has been exceeded.
Really? For unlock?
The pthread_mutex_unlock() function may fail if:
[EPERM]
The current thread does not own the mutex.
Again, if lock succeeded then this also should not occur.
So, my thoughts are if there is a successful lock then in this situation unlock should never fail making the error check and subsequent handling code pointless.
Well before you cry "victory". I ended up on this page looking for a reason why one of my programs failed on a pthread_mutex_unlock (on HP-UX, not Linux).
if (pthread_mutex_unlock(&mutex) != 0)
throw YpException("unlock %s failed: %s", what.c_str(), strerror(errno));
This failed on me, after many million happy executions.
errno was EINTR, although I just now found out that I should not be checking errno, but rather the return value. But nevertheless, the return value was NOT 0. And I can mathematically prove that at that spot I do own a valid lock.
So let's just say your theory is under stress, although more research is required ;-)
From the man page for pthread_mutex_unlock:
The pthread_mutex_unlock() function may fail if:
EPERM
The current thread does not own the mutex.
These functions shall not return an error code of [EINTR].
If you believe the man page, it would seem that your error case cannot happen.