Fetch and execution cycle - memory

Im trying to understand the instructions cycle of a cpu and how it executes instructions.
From my understanding, while a program is compiled into binary it is stored on the hard drive. When you run the program, the binary execution code is loaded into RAM and the binary resides in the .text area of the memory space for a process. However, im confused as to why the return address needs to stored on the stack. If we are executing 10 instructions, say the 3rd instruction is a function call, the kernal begins allocating memory for the function and also for the return address. Ive have read that the return address is the next instruction from the EIP register. Here, would the return address be the 4th instruction located in memory so that when the stack for the function is torn down again, we instructions will continue executing from the 4th? When the function executes, do the instructions jump from 3rd to 8th for example?
This is the source if it helps:
https://www.csee.umbc.edu/~chang/cs313.s02/stack.shtml
Next, main pushes the arguments for foo one at a time, last argument
first onto the stack. For example, if the function call is:
a = foo(12, 15, 18) ; The assembly language instructions might be:
push dword 18
push dword 15
push dword 12 Finally, main can issue the subroutine call instruction:
call foo
When the call instruction is executed, the contents of the EIP register is pushed onto the stack. Since the EIP
register is pointing to the next instruction in main, the effect is
that the return address is now at the top of the stack. After the call
instruction, the next execution cycle begins at the label named foo.
Figure 2 shows the contents of the stack after the call instruction.
The red line in Figure 2 and in subsequent figures indicates the top
of the stack prior to the instructions that initiated the function
call process. We will see that after the entire function call has
finished, the top of the stack will be restored to this position
Any help is appreciated, thankyou

Related

lost in debug ... can't stop execution

I am trying to understand why an installation file hangs up using Windbg, but I am at a point where I can't stop the execution.
As background, I had already been able to install this program on the same PC, but for some reason I had then uninstalled, and now I can't re-install it (I tried to clean up everything from the old installation, incl. registry). Now this setup.exe starts and stays idle among the running processes without doing anything.
But let's go to the actual question. I am trying to use Windbg for the first time (I only had some practice with the old 8086 debug at DOS-time :-), so please bear with me if I'm asking something straightforward).
I have tracked the code up to a point where I have a RET code. I am able to stop the debugger at the RET instruction, but as soon as I "step into" the RET, the execution starts and does not stop, while I was expecting it to just go to the instruction following the previous CALL. From how I see things, it seems that after the RET the execution goes somewhere else ... how is it possible? Also, just before the RET there is a SYSCALL that I don't fully understand ... can it have an impact?
This is the portion of the code I am examining at the moment:
ntdll!NtTerminateThread:
00007ff9`fc8b5b20 4c8bd1 mov r10,rcx
00007ff9`fc8b5b23 b853000000 mov eax,53h
00007ff9`fc8b5b28 f604250803fe7f01 test byte ptr [SharedUserData+0x308 (00000000`7ffe0308)],1
00007ff9`fc8b5b30 7503 jne ntdll!NtTerminateThread+0x15 (00007ff9`fc8b5b35)
00007ff9`fc8b5b32 0f05 syscall
00007ff9`fc8b5b34 c3 ret
00007ff9`fc8b5b35 cd2e int 2Eh
00007ff9`fc8b5b37 c3 ret
I am stuck at the first RET instruction, at address 5b34.
At this time, this is the stack call:
00000000`0203fc38 00007ff9`fc86c63e ntdll!NtTerminateThread+0x14
00000000`0203fc40 00007ff9`fc8d903a ntdll!RtlExitUserThread+0x4e
00000000`0203fc80 00007ff9`fc86c5c5 ntdll!DbgUiRemoteBreakin+0x5a
00000000`0203fcb0 00000000`00000000 ntdll!RtlUserThreadStart+0x45
so my understanding is that execution should continue at address 00007ff9`fc86c63e. However, even if I add a BP at this address, or if I just go for a trace, the execution continues and keeps running some idle loop until I hit the "pause" button in windbg, after which it resume at a completely different address.
In case the registers are relevant, here are some of them:
rax: 353000
rbx: 0
rcx: 0
rdx: 0
rsp: 203fc38
rdi: 7ff9c8d8fe0
rip: 7ff9fc8b5b34
So, eventually, where am I wrong? How can I see where the code goes after this RET?
Thanks in advance for any help,
Bob
when a thread exits the execution wont return to the return address
the system is free to schedule another thread that is ready in the process
the stack shows NtTerminateThread on the stack it is a function that does not return
__declspec noreturn foo (...) ;
btw when you say it goes elsewhere do you mean the app keeps running and is not terminated if so hit ctrl+break and check what other threads are doing
ie if usermode ~*kb should show all the threads callstack
answering the comment about where it goes
process is a collection of threads
each thread has a stack and each thread gets a bit of time to execute from the scheduler (thread quantum)
each thread that has a lower priority can be preempted by threads xxxx ,yyy ,zzz with higher priorities by interrupts by apcs , dpcs etc
when a thread has completed its quantum or is preempted by some vip cavalcade happening to travel on the road that this poor thread is walking
a trap is made _KTRAP and this poor threads position EIP is filled into it and put in a waiting threads barricade
when the vip cavalcade's dust has settled police open the barricade and let the poor thread walk from where it stopped
for such gory details you may need a kernel debugging setup and may need to control your process from a kernel debugger
when you hit the return os sees the thread is dead and has no return address
so it checks the !ready threads and selects the highest priority thread and provides it a quantum to enjoy
so before hitting the return address check what all other threads are doing in your app set an appropriate break on threads of interest and hit the return when the other thread executes its quantum your break will get hit
You're looking at the wrong thread!
From the partial output you supplied seems like you're attaching to a running process (rather than start it from the debugger). To break into a running process the debugger injects a thread into the target process that basically contains a hardcoded int 3 instruction and not much more.
It does it by calling ntdll!RtlpCreateUserThreadEx (the internal undocumented native parallel of CreateRemoteThread) supplying ntdll!DbgUiRemoteBreakin as the start address for the new thread.
The sole purpose of this synthetic thread is to generate the breakpoint exception. This exception causes the operating system to stop running the target process and passes control to the debugger. After it does this it's not needed anymore and it commits suicide.
What you're supposed to do at this point is probably switch to your thread of interest using ~s command, set breakpoints and then continue execution.
If try to step through this synthetic thread it will just end, and then the process will continue doing whatever it was doing before you broke into it, which is pretty much the opposite of what you want,
That's what this stack means:
00000000`0203fc38 00007ff9`fc86c63e ntdll!NtTerminateThread+0x14
00000000`0203fc40 00007ff9`fc8d903a ntdll!RtlExitUserThread+0x4e
00000000`0203fc80 00007ff9`fc86c5c5 ntdll!DbgUiRemoteBreakin+0x5a
00000000`0203fcb0 00000000`00000000 ntdll!RtlUserThreadStart+0x45
ntdll!RtlUserThreadStart is the real user-mode entry point of all user-mode threads and you can see that it just called ntdll!DbgUiRemoteBreakin after which you continued a bit until the thread finally ends itself.

uC/OS II memory management,OSMemPut() return the memory block without memory clear

I am a newbie about uC/os II, and confused with the memory management.
In function OSMemGet(), we can see that task require the first block of the memory area's linked list(OSMemFreeList),
then in OSMemPut(), return the used block to OSMemFreeList's first block without memory clean.
If there is a task get a block ,store an int (eg. 250) into it, then return this block . Later in this task OSMemGet() require this block again, is int 250 still in this block? How can I read it again?
aha , I know how to get previous stored content now. every memory block we get from the OSMemFreeList, storing next block's address in it's first 4 bytes,we need skip these bytes ,then we can read these data again for ucos does not clear memory block in OSMemPut().
You are not supposed to access blocks you have put back, so there's no guarantee this will work in the future. What you are seeing in those first 4 bytes is the address of the next block. The free blocks are stored as a linked list so as they are created / put back, they are relinked in the chain.

ios modify registers to call function

i connect to iphone's debugserver and able to send GDB Serial Protocol packets. I can set breakpoint and wait until it reached. When it did i want to call objc_msgSend with known parameters, get it's output and continue execution. For now i am simulating it's process in xcode and lldb, so i can not use just 'call objc_msgSend(object, _cmd)'.
what i do:
set breakpoint to some code
register read pc // read next operation address
register write lr 0x0x0000253a // set return address to continue execution (pc value)
register write pc 0x30300c88 // my objc_msgSend address
register write r0 0x16ed30 // my object address
register write r1 0x3161 // my selector address
breakpoint set -a 0x0x0000253a
continue
So i have my method called, but then app crashes and never reaches my 'return address' 0x0x0000253a. Also it rewrites r0 with return value, so my method is totally incomplete. I understand that what i do is hardcore overwriting registers without storing and restoring previous values so please help. How can i store/restore registers state, what i am doing wrong or what necessary things i do not do?
Also it could be very helpful to trace xcode's debugger for what it is doing while 'call objc_msgSend'. I tried to use this code and fruitstrap to use dtruss and then research it's output - it had thousands of memory reads and breakpoint sets, useless for me.
Note: i can use only GDB Serial Protocol.

how do the registers get saved when a process gets interrupted?

this has been bugging me all day. When a program sets itself up to call a function when it receives a certain interrupt, I know that the registers are pushed onto the stack when the program is interrupted, but what I can't figure out is: how do the registers get off the stack? I know that the compiler doesn't know if the function is an interrupt handler, and it can't know how many arguments the interrupt gave to the function. So how on earth does it get the registers off?
It depends on the compiler, the OS and the CPU.
For low level embedded stuff, where an ISR may be called directly in response to an interrupt, the compiler will typically have some extension to the language (usually C or C++) that flags a given routine as an ISR, and registers will be saved and restored at the beginning and end of such a routine. [1]
For common desktop/server OSs though there is normally a level of abstraction between interrupts and user code - interrupts are normally handled first by some kernel code before being passed to a user routine, in which case the kernel code takes care of saving and restoring registers, and there is nothing special about the user-supplied ISR.
[1] E.g. Keil 8051 C compiler:
void Some_ISR(void) interrupt 0 // this routine will get called in response to interrupt 0
{
// compiler generates preamble to save registers
// ISR code goes here
// compiler generates code to restore registers and
// do any other special end-of-ISR stuff
}

Delphi SampleProfiler: How is this code calling into ntdll.dll?

i profiled a portion of my application using the Delphi Sampling Profiler. Like most people, i see a majority of the time spent inside ntdll.dll.
Note: i turned on the options to ignore Application.Idle time, and calls from System.pas. So it
isn't inside ntdll because the
application is idle:
After multiple runs, multiple times, the majority of the time seems to be spent inside ntdll.dll, but the odd thing is who the caller is:
The caller is from the Virtual Treeview's:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
Note: The application is not inside ntdll.dll because the
application is idle, because the
caller isn't Application.Idle.
What confuses me is that it's this line itself (i.e. not something inside PrepareCell) is the caller into ntdll. Even more confusing is that:
not only is it not something inside PrepareCell()
it's not even the setup of PrepareCell (e.g. popping stack variables, setting up implicit exception frames, etc) that is the caller. Those things would show up in the profiler as a hotspot on the begin inside PrepareCell.
VirtualTrees.pas:
procedure TBaseVirtualTree.PrepareCell(var PaintInfo: TVTPaintInfo; WindowOrgX, MaxWidth: Integer);
begin
...
end;
So i'm trying to figure out how this line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
is calling ntdll.dll.
The only other ways in are the three parameters:
PaintInfo
Window.Left
NodeBitmap.Width
Maybe one of those is a function, or a property getter, that would call into ntdll. So i put a breakpoint on the line, and look at the CPU window at runtime:
There is a line in there that might be the culprit:
call dword ptr [edx+$2c]
But when i follow that jump, it doesn't end up in ntdll.dll, but TBitmap.GetWidth:
Which, as you can see, doesn't call anywhere; and certainly not into ntdll.dll.
So how is the line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
calling into ntdll.dll?
Note: i know full well it isn't really calling into ntdll.dll. So any valid answer will have to include the words "Sampling Profiler is misleading; that line is not calling into ntdll.dll." The answer will also have to either say that the majority of the time is not spent in ntdll.dll, or that the highlighted line is not the caller. Finally any answer will have to explain why Sampling Profiler is wrong, and how it can be fixed.
Update 2
What is ntdll.dll? Ntdll is Windows NT's native API set. The Win32 API is a wrapper around ntdll.dll that looks like the Windows API that existed in Windows 1/2/3/9x. In order to actually get into ntdll you have to call a function that uses ntdll directly or indirectly.
For example, when my Delphi application goes idle, it waits for a message by calling the user32.dll function:
WaitMessage;
When when you actually look at it is:
USER32.WaitMessage
mov eax,$00001226
mov edx,$7ffe0300
call dword ptr [edx]
ret
Calling the function specified at $7ffe0300 is the way Windows transitions into Ring0, calling the FunctionID specified in EAX. In this case, the System Function being called is 0x1226. On my operating system, Windows Vista, 0x1226 corresponds to the system function NtUserWaitMessage.
This is how to you get into ntdll.dll: you call it.
i was desperately trying to avoid a hand-waving non-answer when i worded the original question. By being very specific, carefully pointing out the reality of what i'm seeing, i was trying to prevent people from ignoring the facts, and trying to use a hand-waving argument.
Update Three
i converted the two parameters:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
into stack variables:
_profiler_WindowLeft := Window.Left;
_profiler_NodeBitmapWidth := NodeBitmap.Width;
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
To confirm that the bottleneck is not is the call to
Windows.Left, or
Nodebitmap.Width
Profiler still indicates that the line
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
itself is the bottleneck; not anything inside PrepareCell. This must mean that it's something inside the setup of the call to prepare cell, or at the start of PrepareCell:
VirtualTrees.pas.15746: PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
mov eax,[ebp-$54]
push eax
mov edx,esi
mov ecx,[ebp-$50]
mov eax,[ebp-$04]
call TBasevirtualTree.PrepareCell
Nothing in that calls into ntdll. Now the pre-amble in PrepareCell itself:
VirtualTrees.pas.15746: begin
push ebp
mov ebp,esp
add esp,-$44
push ebx
push esi
push edi
mov [ebp-$14],ecx
mov [ebp-$18],edx
mov [ebp-$1c],eax
lea esi,[ebp-$1c]
mov edi,[ebp-$18]
Nothing in there calls into ntdll.dll.
The questions still remain:
why is pushing of one variable onto the stack, and two others into registers the bottleneck?
why isn't anything inside PrepareCell itself the bottleneck?
Well, this problem was actually my main reason to make my own sampling profiler:
http://code.google.com/p/asmprofiler/wiki/AsmProfilerSamplingMode
Maybe not perfect, but you could give it a try. Let me know what you think about it.
Btw, I think it has to do with the fact that almost all calls ends into calls to the kernel (memory requests, paint events, etc). Only calculations do not need to call the kernel.
Most calls ends in waiting for kernel results:
ntdll.dll!KiFastSystemCallRet
You can see this in Process Explorer with thread stack view, or in Delphi, or using StackWalk64 API in my "Live view" of AsmProfiler:
http://code.google.com/p/asmprofiler/wiki/ProcessStackViewer
There are probably two things happening there.
The first is that SamplingProfiler identifies the caller by walking up the stack, until it encounters what looks like a valid call point into Delphi from Delphi code.
The thing is, some procedures may reserve a large amount of stack at once, without reinitializing it. This could result in a false positive. The only clue then would be that your false positive was recently invoked.
The second thing is the ntdll localization, that is known for certain, however, ntdll is your wait point in user-space, and as user197220, ntdll is where you'll end up waiting most of the time you're calling system stuff and waiting for the result.
In your case, unless you reduced the sampling rate, you're looking at 247ms of CPU work time, which could probably pass as idle if those 247 samples were collected over many seconds of real time. Since the false positive points to VirtualTree paint preparations, my bet would be that the ntdll time is actually paint time (driver or OS software).
You can try commenting out the code that actually does the painting to be sure.

Resources