Triaging iOS kernel panic related to sockets [closed] - ios

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have an iOS app that does alot of low-level socket work, and recently after adding IPv6 support I noticed there is a kernel panic which occurs sometimes when running my app. The entire device reboots, and I get a panic file which has alot of cryptic information (including no stack frames from my app), but a few critical things as shown here:
panic(cpu 0 caller 0xffffff800f15fba0): assertion failed: se->se_flags & SEF_ATTACHED, file: /SourceCache/xnu/xnu-2784.30.7/bsd/kern/uipc_socket.c, line: 6228
Debugger message: panic
Fortunately this module is open source, and I found the code for a close version here: http://opensource.apple.com//source/xnu/xnu-2782.1.97/bsd/kern/uipc_socket.c
The error seems to match up with this function:
void
sockaddrlist_remove(struct sockaddr_list *sl, struct sockaddr_entry *se)
{
VERIFY(se->se_flags & SEF_ATTACHED);
se->se_flags &= ~SEF_ATTACHED;
VERIFY(sl->sl_cnt != 0);
sl->sl_cnt--;
TAILQ_REMOVE(&sl->sl_head, se, se_link);
}
I'm pretty sure the first VERIFY(), which is basically an assert, is failing.
However, this just tells me that some memory was probably corrupted by my program some time before this code got to run. So, like most memory corruptions, it is very challenging to find the cause.
Based on my logging, I see this happen after roughly some networking calls, including socket(), connect(), read() and write(), though it wouldn't be feasible to give the code here.
Another piece of information is this only happens with IPv6. On IPv4, everything works without issue. But I have scrubbed the IPv6 code and have not found anything obviously wrong. Also I'm confused how any memory corruption issue in user space would make the kernel fail. Maybe understanding how this could happen would help me trace the issue.
The next step most people would say is to try the guard malloc, however unfortunately when I try to turn that on I run into another problem, so for now lets just make the assumption that I cannot use guard malloc at present.
I have also tried to attach to the program live while running and make it crash, but it doesn't stop in the debugger anywhere, it just reboots the entire device (iPad).
If anyone has any triaging ideas for this tricky bug, please let me know.
EDIT:
Based on the feedback from one of the answers, I've checked all the lengths for the relevant socket API calls and those seem to be correct. So it seems like there is some other issue here, possibly overwriting memory.
I was able to try using "Malloc Guard Edges", but then the problem stops happening. I can't use "Guard Malloc" since it will only work on the simulator, and my app doesn't run well on the simulator due to how it interacts with the hardware.
If anyone has any more ideas, please let me know.

I have seen this happen (improperly coded IPv6 changes in iOS app causing the entire phone to reboot).
In my case, it was caused by making the system call sendto() with the wrong dest_len, one that didn't match the size of the structure pointed to by dest_addr. This kind of issue is possible to come up when adding support for IPv6 because, when everything was IPv4, all sockaddr structures are sockaddr_in, with the same size which can be hard-coded, but when you can have IPv4 and IPv6 addresses, you can have different-sized structures, and you have to pass the correct length corresponding to the structure passed. Your particular issue might not be exactly in sendto(), but it might be a similar issue so I would check every system call where you need to pass a socklen_t, including bind() and connect().
I agree that no code in the app should be able to cause the phone to reboot, and that the fact that this is possible is an Apple bug.
Although it is not possible for the debugger to stop on a kernel panic (because the device disconnects, so the debugger stops), you can still debug it in a way, if you know approximately where it happens, you can step line-by-line in the debugger, and see which line it kernel panics at. The line it panics on will be a function call (the panic happens inside the call), so you can now step inside that function call, and repeat, until you've narrowed it down to the specific system call.

Related

FreeRtos how to store function address while context switching

I using freertos on my project. My code is stuck in hardfault handler, I want know last executed function address or last executed line address for debugging.How to debug code when PC is pointing Hardfault handler.
That information is 100% dependent on which microcontroller you are using, and also which tool chain you are using as some IDEs will do this for you. You failed to provide either piece of information, so are asking people to guess on your behalf. A good question is one that cannot only possibly be answered by another question.
I am going to guess you are using a Cortex-M microcontroller, in which case information on debugging a hard fault can be found on lots of links found by Google, including the following: http://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html

How do I symbolicate from a list of libraries and addresses in them?

I'm trying to symbolicate a crash log on my device. I have the stack frames, the instruction pointer addresses for each frame, the module that the IP was in, and the offset into that module. My plan is to use dladdr() to get the function or method for each stack frame address.
I'm trying to do this on a fresh app launch, so I don't know that the libraries are currently loaded or what addresses they're loaded at. I can use dlopen() to ensure that the library is open, but I'm not sure what base address to add my previously calculated offset to.
Is there any way to determine where the library is loaded or make sense of the handle returned from dlopen()?
The problem with doing the symbolication on the next startup on the device is, that you need to make sure that the app version did not change (if you want to symbolicate those too, which you shouldn't since you won't get line numbers) and need to make sure that the iOS version is also the same. So trying to open them might give you more trouble unsuccessful results than you want.
The safest and most reliable way is to symbolicate on your Mac or a server where you can collect all symbols and are also able to get line numbers for your own apps code.
Why don't use just use PLCrashReporter to collect the crash logs? This does all you need in a very safe and reliable way, including catching exceptions, signal handlers etc. This article shows more about some of the problems working with crashes has: http://landonf.bikemonkey.org/code/objc/Reliable_Crash_Reporting.20110912.html
See https://code.google.com/p/plcrashreporter/ and our fork with some additions for Mac support and safe (!!) symbolication of system libs right when the crash happens, see https://github.com/bitstadium/PLCrashReporter and https://github.com/bitstadium/HockeySDK-iOS which uses that fork.
One important note I forgot to mention: since iOS 6 a lot of symbols result in "redacted" when symbolicated on the device. Another reason you may want to avoid this.

How to implement/set a data breakpoint? [duplicate]

This question already has an answer here:
How are data breakpoints created?
(1 answer)
Closed 1 year ago.
Requirements:
I need to generate an interrupt, when a memory location changes or is written to. From an ISR, I can trigger a blue screen which gives me a nice stack trace with method names.
Approaches:
Testing the value in the timer ISR. Obviously this doesn't give satisfying results.
I discovered the bochs virtual machine. It has a basic builtin debugger that can set data breakpoints and stop the program. But I can't seem to generate an interrupt at that point.
bochs allows one to connect a gdb to it. I haven't been able to build it with gdb support though.
Other thoughts:
A kind of "preview instruction" interrupt that triggers for every instruction before executing it. The set of used memory-writing instructions should be pretty manageable, but it would still be a PITA to extract the adress I think. And I think there is no such interrupt.
A kind of "preview memory access" interrupt. Again, I don't think its there.
Abuse paging. Mark the page of interest as not present and test the address in the page fault handler. One would still have to distinguish read and write operations and I think, the page fault handler doesn't get to know the exact address, just the page number.
See chapter 16 in Intel's Software Developer's Manual Volume 3A. It gives information about using the debug registers, which provide support for causing the debugger exception when accessing a certain address, among other things. The interrupt will be triggered after the instruction which caused it. Specifically, you will have to set one of dr0-dr3 to the address you want to watch, and dr7 with the proper values to tell the processor what types of accesses should cause the interrupt.

How do I debug an Access violation in the field?

An application in the field is getting this message intermittently:
I am not able to reproduce this on my machine. I have also traced what I believe is the relevant code and can't find any access to uninitialized objects.
I've never had to deal with this kind of problem.
I did a build with madExcept and unfortunately the program does not crash once it is bundled.
Any opinions on madExcept vs EurekaLog for finding this kind of thing? I've never used FastMM. Would it be useful in his situation? (Delphi 2010) Any suggested flags to set in FastMM? Any other recommendations?
Note the very low address you are attempting to read. This sort of error almost certainly means you attempted to dereference a nil pointer even if you can't find one.
Given your description of the behavior I would suspect you've got a memory stomp going on--something is blasting a zero on top of the pointer to an object. When you change things you move things around and the stomp moves to someplace harmless.
Turn on both range checking and overflow checking.
Note the offending object must be at least 3C0 bytes in size--this should help narrow it down, most objects will be smaller than this.
What I have done in the past with such errors that only show in the field is put logging checkpoints in--a bunch of lines that display something in an out of the way place--a simple sequence of numbers is fine. Find out what number is showing when it crashes and you know which of you checkpoints was the last to execute. If that doesn't narrow it down enough you can repeat the process now that you've narrowed it down.
With a full map file you can identify the exact point in the code where this occurs. I hope you have a full map file for this image! Subtract $00401000 from the address at which the exception is raised ($007ADE8B in your case) and that corresponds to the values in the map file.
Having done that you know which object is nil and from there it is usually not too hard to work out what is going on.
One of the most common ways for this to occur is when a constructor raises an exception. When this occurs the destructor runs. If you access, in a destructor, a field that has not been initialised, and do anything other than call Free on it, then you will get an exception like this.
Looks like a memory overwrite where changing memory layout (your machine vs field machine or adding madExcept) makes the overwrite change something harmless.
FastMM is great at of making this kind of problems happen more consistently (and finding their source). Download the full version of FastMM, add it as the first unit of your project, and turn on FullDebugMode on its settings.
It might cause the problem to be reproduceable in your machine right away. If not, don't forget to deploy FastMM_FullDebugMode.dll with your application for testing. Keep madExcept on and let it embed the .map file for call stacks.

Delphi programs blocked by antivirus programs [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I have a piece of code that is trying to write to disk many files in one second. However, it fails wince I have installed Kaspersky Anrivirus 2011.
Stream:= TFileStream.Create(sName, fmCreate);
The code totally worked with Kaspersky 2010 and also works with Kaspersky 2011 if I disable its scanners (it cannot be totally unloaded from memory - unless it is uninstalled). The code also works if (Kaspersky 2011 is running and) I write to disk slooooowly. So it obviously is not fast enough to handle my disk requests.
The error I get is EFCreateError ('Cannot create xxx file blablabla'). Error is random. Most of the files are written to disk. About 10% fail.
I have tried to get support but is impossible to find a real person at Kaspersky to speak with. Their so called 'support' is actually a FAQ data base. Of course it speaks about how to install the product and related stuff. There is nothing about programing-related issues. Any ideas?
PS: this has repercussions for the entire Delphi community! All our customers will fail to use Delphi software if they are using KIS 2011 as antivirus. For the moment I recommend to my users to disable their antivirus but I need a real solution.
It will be nice if a person with KIS 2011 can confirm the problem. Just create a tiny program that write 200 small files to disk using TFileStream.
UPDATE:
The problem appears ONLY when the file does not exist and it is created (created as opposed to overwritten).
Similar report: https://forums.embarcadero.com/thread.jspa?threadID=32751&tstart=15
Similar report: http://forum.kaspersky.com/index.php?showtopic=120561
A possible solution that popped in my mind is to detect if KIS is running and if it is, to put a delay after each writing to disk. Or at leat, let the user know there may be problems. Anybody knows how to detect if a service is running?
I added a delay of 650ms (after each file creation) and the bug is still there). So is not about how fast you write to disk but about how many files you write.
Just uninstalled KIS 2011. The problem does not appear anymore.
Just reinstalled the good old KIS 2010. The bug is still there but it appear rarely (about every 300 files instead of about 30 as in KIS 2011).
The problem was confirmed on a second computer.
NEWS: The crash appears in TFileStream.Create however it may be caused by a function called earlier: TestWriteAccess. If I disable this function, the TFileStream.Create doesn't fail anymore. Well, this doesn't change things too much. No matter which line of code generates the error, the program still fails (randomly) to write files to disk while Kaspersky is running.
Still waiting a response from a real person from Kaspersky...
More automated responses received from Kaspersky support (I sent emails to support in several countries). All pointing to a FAQ database.
I change my status from Kaspersky fan (and customer) to Kaspersky hater because I finally receive an answer from a real person from Kaspersky support and it was plain and simple obnoxious.
To test the code, try to use the code in a loop, to create 1000 files. The program creates a bunch of files (random number) then it fails at StreamFile:= TFileStream.Create.
Update: The issue can be fixed by entering a small delay after creating each file.
https://docs.google.com/forms/d/1H3_O1z1iEqfh9ZT9u3B0R1tGEj-Hc9o7rAE0LKPr33Y
2013 Update
Starting with this afternoon (after an update) KIS conflicts with Delphi.
Every time I compile a project KIS spikes to 100% CPU utilization. I will have to uninstall it.
2017 Update
All false positive alarms disappeared magically for all my Delphi programs starting with 2017. It seems that it was enough for a program like Kaspersky remove Delphi-generated executables from its virus list; all other smaller antivirus programs followed.
Delphi 7, Win 7 (32), KIS 2011
You need to instruct your users, i.e. Kaspersky's customers, that Kaspersky is interfering with the operation of your software, and that THEY should report it. Express your frustration that you, as a developer, don't have access to a real human being. This is the only way that the anti-malware companies will ever react - bad PR with their paying customers.
Kaspersky = pirate company? Maybe yes, maybe no. Maybe just yet another company with a bad product and nonexistent support. Their "support" consists in a FAQ database and an automatic email answering program. Phones are hooked to answering machines also. Their automated answer keep explaining me how to add my program in KIS "exception" database. I keep replying to those stupid emails that I cannot personally go to all my customers at home and put my program in the "exception" database and that it will be better if they will fix the bug.
When I finally got a non-automatic answer (the only one), the support guy fella is as rude as possible.
Possible solutions for Delphi programmers:
* Don't check if the user has write permission to a file (in order not to trigger Kaspersky bug)
* Check if the user has write permission. If the bug appears inform the user that Kaspersky creates problem and it should be temporary disable (while the program is running). Use a TRY EXCEPT block to do this.
Advice (based on my past experience):
Don't always blame your code if you ever received strange bug reports from your users when your program was trying to write to disk. Check also external factors (like existence of Kaspersky antivirus).
UPDATE:
I just applied for a refund. I will go for a chargeback if they won't refund the money (I strongly feel they won't).
Conclusion
When I posted this on StackOverflow I didn't realized the magnitude of the problem and I didn't realized it will deviate so much from initial course. Still I think it is well within the purpose of StackOverflow. We have all learned that sometimes the problems in out programs may not be caused by our faulty code and neither we can control the source of these problems (21 persons voted this question up - which means a lot of other people encountered issues with KIS).
We can just hope that poor designed programs that interacts with user's system at a very low level (such as KIS antivirus program) will be soon fixed so our sales won't suffer (much).
It is just frustrating when your program is labeled "buggy" and you can't do much about it!
Not an answer to solve your problem, but you should inform Kaspersky, probably they don't know there is a virus signature associated with a Delphi library.
And if your program isn't too complex, you might want to try Lazarus/FPC. It's not as good as Delphi, but I've been using it for several years now, and have got good results in Windows/MacOS/Linux.
i had similar problems with kaspersky 2011 when i was trying to add my prog to windows startup using d2010's new TFile.Copy() as well as raw api function:
CopyFile(PChar('C:\chellenger.exe'), PChar('C:\Documents and Settings\Omair\Start Menu\Programs\Startup\chellenger.exe'), False);
my solution was to put my delphi app in vb.net app as a resource, the vb.net app extracted it and put it to startup without false positives . Mixing two languages for your problem might solve your problem too(1 possible solution but a very ugly and nonprofessional solution i admit)
When you create file, any antivirus checks it. There is probably some kind of collision between your application and KAV. Have you tried to combine fmCreate with share modes. You can see in help for TFileStream.Create for available modes.
If the problem is just with kapersky, then just have your program detect if it is running. If so, scale back your file creation / writes to whatever passes their detection. Make sure you have some little status message somewhere that tells the user why things are slow. Incidentally, virus writers already know this which is why those heuristics simply don't work.
After doing that, contact Kapersky and work with them directly to get this resolved.
This gets past your immediate issue and will give you and kapersky time to figure out a long term solution.
Alternatively, you could simply shut kapersky down.. Just make sure you grab all of their watch dogs in the process.. But that tends to be a little more combative.
Creating a huge amount of files sounds like something that isn't necessarily A Good Thing, but you probably have your reasons :)
When you get the error code in Delphi, does KAV pop up any heuristic warnings, or is it completely silent? It wouldn't be weird to get a heuristic "omg, that app is doing something bad!" from creating a ton of new file, but if KAV is silent I'd say it's a bug.
Can you post a delphi executable with the tiniest amount of code that reproduces the bug? And a version that does the same step but only creates one file, it might be interesting to trace with SysInternals' ProcMon.
First, do you really need to test for write permissions by creating a file? Can't you just check the permission directly? I feel that creating a file for that purpose only is a lame way of doing it in any case.
Second, like noted above, it's likely that after you create and then delete a file, there is some intervention by Kaspersky's security mechanisms. Probably a driver tries to check the contents of the file you deleted, and keeps it alive for a while. Like this:
You create the file and open it, incrementing the refcount.
Kaspersky driver notices that and opens the file too. Even if you set share mode deny, as a driver it probably has the power to open it anyway (if Kaspersky could not circumvent sharing denials, any virus could have used the same trick to hide its data!).
You close the file and delete it. When you delete the file, the system just marks it "FILE_FLAG_DELETE_ON_CLOSE", but the file is still there until all the handles to it are closed.
Kaspersky continues to scan file, still haven't released the handle.
Therefore the file is still there.
You try to create a new file and the call fails because the old file is still not deleted.
The reason for all this mess is, of course, partly Kaspersky's checking mechanics, but they did nothing especially wrong here. Kaspersky needs to scan the file anyway, hardly anything can be done about that - it's antivirus, for crying out loud. On the other hand, checking permissions by creating and then deleting a file is (probably) very, very wrong. So I guess, you're the one at fault here.
I had the same problem. KIS made all kind of troubles. Until I reinstalled it. So, it was just a faulty installation.

Resources