Executable Initialization - stack

When is it decided where the stack, global, and frame pointers are in memory? I'm trying to load an ELF executable into a simulator and I can't figure out what instructions load the global, stack and frame pointers into the regfile.

It depends on the architecture, but generally the kernel sets up the initial stack and frame pointers before any user-space instructions execute, and the global pointer (if any) is established by the dynamic linker. The "initial process state" section of your architecture's ABI supplement will explain a lot of this stuff, but for the rest you will probably need to read the source code for your dynamic linker.
If your simulator is user-space only it has to do the kernel's job.

Related

No symbols for valgrind massif dlclose()

massif doesn't show any function names for functions which are in a lib and this lib is closed by dlclose().
If I remove dlclose(), and run the recompile and execute program I can see the symbols. Is there a way to know the function names without changing the source code?
The new version of valgrind (3.14) has an option that instructs valgrind to keep the symbols of dlclose'd libraries :
--keep-debuginfo=no|yes Keep symbols etc for unloaded code [no]
This allows saved stack traces (e.g. memory leaks)
to include file/line info for code that has been
dlclose'd (or similar)
However, massif does not make use of this information.
You might obtain a usable heap reporting profile by doing:
valgrind --keep-debuginfo=yes --:xtree-leak=yes
and then visualise the heap memory using e.g. kcachegrind.

Does JIT creates an output containing native code?

In the context of the JIT compiler that acts on the Assembly (containing metadata and intermediate language):
The assembly is generated on the disk by the specific language compilation so as the CLR makes his own independent compilation to convert the MSIL into native code. Is there a visible output created on disk after this compilation? A file/s containing binary code or similar?
Here is a quite explicit article where I found the answer. Basically, there is not an output file and the native code is stored dynamically in memory during the runtime.
When managed code calls a particular method, the compiling function wakes up, looks up the intermediate code (processor-agnostic object code that's similar to the machine code), then compiles the intermediate code into instructions for the available processor. The managed code then saves those instructions in a dynamically allocated location in memory. The compiling function points back to the original method so that the two are linked: When the method in the assembly executes, it executes the processor instructions stored in memory.

Nvwa leak detection library crashes on iOS

Nvwa crashes on free calls in its delete operator overrides, especially on simulator, with error:
malloc: *** error for object [hexadecimal address]: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Your leak detection library apparently and ironically has a heap corruption bug, likely use-after-free. I suggest you use Instruments or malloc history to determine what that address previously corresponded to and then audit the lifecycle of that allocation and pointers to it.
It's all about compilation settings on the target that you use to build the library.Nvwa uses platform-specific macros to recognise available threading libraries and it so happens that _PTHREADS is not normally defined on iOS and I did not allow C++11 mutexes by NOT defining NVWA_USE_CXX11_MUTEX set to 1. So we have few alternatives, either define _PTHREADS, allow use of C++11 mutexes or change Nvwa code to also check for _POSIX_THREADS macro... One of the places that absolutely require "proper" mutexes is the update of allocation list that is used to report leaks (in debug_new.cpp). Without synchronising there, next element pointers are bound to eventually point to released memory and use-after-free is just a matter of time.

Trying to mix in OpenCL with CUDA in NVIDIA's SDK template

I have been having a tough time setting up an experiment where I allocate memory with CUDA on the device, take that pointer to memory on the device, use it in OpenCL, and return the results. I want to see if this is possible. I had a tough time getting a CUDA project to work so I just used Nvidia's template project in their SDK. In the makefile I added -lOpenCL to the libs section of the common.mk. Everything is fine when I do that, but when I add #include <CL/cl.h> to template.cu so I can start making OpenCL calls, I get over a 100 errors. They all look similar to this, but with different function names at the end:
/usr/lib/gcc/x86_64-linux-gnu/4.4.1/include/xmmintrin.h(334): error:
identifier "__builtin_ia32_cmpeqps" is undefined
I am having a hard time figuring out why. Please help if you can. Also, if there is an easier way to set up a project that'll be able to call the CUDA and OpenCL APIs let me know.
I haven't really worked with cuda, so I don't know how helpful my answer is.
From what I understand you are trying to use opencl directly from your cuda hostcode, which is if I remember correctly compiled using some compiler from nvidia instead the standard gcc. So the problem is probably that this compiler doesn't implement the necessary builtins to work with the mentioned headers.
Look here for a similar problem and it's solution:
http://forums.nvidia.com/lofiversion/index.php?t88573.html
It seems you have to put everything which needs the opencl api into a different (non cuda) compilation unit so that it will be compiled by the non nvidia compiler.
However I wouldn't count on this working (since opencl buffers aren't just pointers to the memory but should contain some metainformations to), simply because there is no real reason it should work and if it does there is no guarantee that it continues to do so.
What you could try if you really want to is using opengl for the interop, since both opencl and cuda have extensions to allow creating buffers from opengl buffers.
However why do you need to do this? Whats keeping you from using Apple's implementation shortterm, since IIRC it's open source and most of it (the opencl parts) should be platform independent anyways.

Native code execution by JVM/CLR

How does JVM/CLR execute JIT compiled native code? Is it by some code injection or by copying code to executable memory? What are the system calls that allows dynamic code execution?
I can explain how we do it in CACAO VM (a research JIT-only JVM). First, the machine code for a method is generated into some heap-allocated memory block. After compilation, the final code length is known, and a chunk of executable memory is allocated using mmap and the PROT_EXEC flag (relevant CACAO code here). Then, the machine code is copied into the mmapped area. After that, many architectures require some machine-specific cache flushing mechanism. As an example, have a look at the cache-flushing function for PowerPC 64. Notably, on i386 and x86_64, there is nothing to do. After this step, the processor is ready to execute the newly-generated code. Alternatively, already allocated memory pages can be marked executable with mprotect. Note that mmap/mprotect are Unix facilities.
I don't know specifically how Java does it, but in general you'd insert "trap" opcodes into the interpreter's instruction stream. There are two opcodes in the JVM spec that seem tailor-made for this purpose.
If you want to know for sure, there's no better answer than the source: http://download.java.net/jdk6/source/
The Common Language Runtime has a methodtable for each type with entries pointing to native code or a native stub to JIT managed code and then fixup the methodtable with the pointer to the just created native code.
MSDN has a more in depth explanation in the MethodDesc section
This blog entry by Dave Notario explains how the CLR JIT compiler works.

Resources