I am using intel pin as my primary DBI tool.
I am interested to know how can I trace all variables allocated in a program .
suppose, we have the following snippet in C:
int *ptr_one, *ptr_two, g;
ptr_one = (int *)malloc(sizeof(int));
ptr_two = (int *)malloc(sizeof(int));
*ptr_one = 25;
*ptr_two = 24;
g = 130;
free(ptr_two);
g = 210;
*ptr_two = 50;
I want to know how can I trace specific variables / memory references in my program . for example on the above code, I like to trace the variable "g" in my program with Intel Pin, how it can be done?
for dynamically allocated variables, I'm monitoring malloc/free calls & follow their addresses, but for static ones I do not have any idea .
Another matter is, for dynamically allocated variables, I like to trace them across the whole program, suppose in the above code, I want to monitor (ptr_two) variable changes and modification during my program from start to finish .
If anyone have some idea about that, it can be nice to share it here, sample codes appreciated in Intel Pin .
thank you all .
Simply stated, you can't associate a name from your source code (be it variable or function name) with a memory location on the compiled binary: this information is (probably) lost on the final binary.
This is not true in two cases:
1) If your binary is exporting functions: in this case other binaries must have a means to call the function by name (minus some subtleties), in which case the information must be available somewhere; for example on Windows, binaries that export functions, variables or classes have an export table.
2) You have symbolic information: in your example, either for the global variable or other local variable, you have to use the symbolic information provided by the compiler.
On Linux you will need an external tool / library / program (e.g. libelf.so or libdwarf.so) to parse the symbolic information from the symbol tables (usually dynsym / symtab) if the binary is not stripped.
On windows you have to rely on the program database (*.pdb files); the format is mostly undocumented (although MS is trying to document it) and you have to use either the DbgHelp API or the DIA SDK.
As stated by the PIN user guide (emphasis is mine):
Pin provides access to function names using the symbol object (SYM).
Symbol objects only provide information about the function symbols in
the application. Information about other types of symbols (e.g. data
symbols), must be obtained independently by the tool.
If you have symbolic information you can then associate a variable name - obtained from an external tool - with an address (relative to the module base for global vars or a stack location for local ones). At runtime it is then just a matter of converting the relative address to a virtual one.
Related
Programming languages like C can, as far as I know, execute system calls to make the OS give them direct memory access to file streams that can be read from/written to. Now, how do programming languages without raw memory access (Java, Python, etc.) even do something like open a file "under the hood"?
Obviously, I'm not just talking about opening files here - some languages have built-in file reading features that would make my question obsolete. This concerns anything that has anything to do with direct memory access - e.g. accessing other devices (for instance the keyboard, the mouse), and so on.
Here's an excerpt of the source code of the FileInputStream class from OpenJDK 8 (link):
/**
* Opens the specified file for reading.
* #param name the name of the file
*/
private native void open0(String name) throws FileNotFoundException;
// ...
private native int read0() throws IOException;
/**
* Reads a subarray as a sequence of bytes.
* #param b the data to be written
* #param off the start offset in the data
* #param len the number of bytes that are written
* #exception IOException If an I/O error has occurred.
*/
private native int readBytes(byte b[], int off, int len) throws IOException;
The native keyword (see this other Q&A) means that these methods are not implemented in the Java source code for this class; they are provided by the implementation of the Java interpreter which executes the program (e.g. the java command-line utility, which executes Java bytecode). The Java interpreter itself is ultimately written in a language like C, which has the low-level features required to actually implement these native methods. When the interpreter has to execute a method marked native, it invokes the corresponding function written in C.
Similarly, other high-level languages like Python have some functions which are implemented "natively" in the same sense. These functions, along with the behaviours of basic arithmetic and comparison operations, are "intrinsic" as compared to the large bulk of a language's standard library, which is usually written in the language itself.
I'm trying to understand what is happening behind the Rcpp::sourceCpp() call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.
Within this post, Dirk said,
"You need to run the sourceCpp() call in each spawned process, or else get them your code."
This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:
clusterExport(cl = cl, varlist = "payoff")
I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport() is for.
The issue here is that the compiled code is not "exportable" to the spawned processes without being embedded in a package due to how binaries are linked into R's processes.
Traditionally, the clusterExport() statement allows for R specific code to be distributed to workers.
By using clusterExport() on an Rcpp function, you are only receiving the R declaration and not the underlying shared library. That is to say, the R CMD SHLIB given in Attributes.R is not shared with / exported to the workers. As a result, when a call is then made to an Rcpp function on the worker, R cannot find the correct shared library.
Take the previous question's function:
Rcpp::cppFunction("NumericVector payoff( double strike, NumericVector data) {
return pmax(data - strike, 0);
}")
Note: I'm using cppFunction() instead of sourceCpp() but the results are equivalent since cppFunction() calls sourceCpp() to create the function.
Typing the function name:
payoff
Yields the R declaration with a shared library pointer.
function (strike, data)
.Primitive(".Call")(<pointer: 0x1015ec130>, strike, data)
This shared library is only available on process that compiled the function.
Hence, why it is always ideal to embed compiled code within a package and then distribute the package.
I want my application to always run using the real gpu on nVidia Optimus laptops.
From "Enabling High Performance Graphics Rendering on Optimus Systems", (http://developer.download.nvidia.com/devzone/devcenter/gamegraphics/files/OptimusRenderingPolicies.pdf):
Global Variable NvOptimusEnablement (new in Driver Release 302)
Starting with the Release 302 drivers, application developers can
direct the Optimus driver at runtime to use the High Performance
Graphics to render any application–even those applications for which
there is no existing application profile. They can do this by
exporting a global variable named NvOptimusEnablement. The Optimus
driver looks for the existence and value of the export. Only the LSB
of the DWORD matters at this time. A value of 0x00000001 indicates
that rendering should be performed using High Performance Graphics. A
value of 0x00000000 indicates that this method should be ignored.
Example Usage:
extern "C" { _declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001; }
The problem is that I want to do this using Delphi. From what I've read Delphi does not support export of variables even though some hacks exists. I did try a few of them but couldn't make it work.
In the same nvidia document I read that forcing the proper GPU can be accomplished via linking statically to one of a handful listed dlls. But I don't want to link to dlls I'm not using. (Why the opengl.dll is not one of them is beyond me.) A simple exported variable seems much cleaner.
From what I've read Delphi does not support export of variables.
That statement is incorrect. Here's the simplest example that shows how to export a global variable from a Delphi DLL:
library GlobalVarExport;
uses
Windows;
var
NvOptimusEnablement: DWORD;
exports
NvOptimusEnablement;
begin
NvOptimusEnablement := 1;
end.
I think your problem is that you wrote it like this:
library GlobalVarExport;
uses
Windows;
var
NvOptimusEnablement: DWORD=1;
exports
NvOptimusEnablement;
begin
end.
And that fails to compile with this error:
E2276 Identifier 'NvOptimusEnablement' cannot be exported
I don't understand why the compiler doesn't like the second version. It's probably a bug. But the workaround in the first version is just fine.
I'm not a Delphi expert, but AFAIK it is possible to link to static libraries implemented in C from Delphi. So I'd simply create a small stub library, just providing this export, which is statically linked into your Delphi program. This adds the very export you need.
This should be a very simple,very quick qustion. These are the first 3 lines of a program in C I wrote:
Dump of assembler code for function main:
0x0804844d <+0>: push ebp
0x0804844e <+1>: mov ebp,esp
0x08048450 <+3>: and esp,0xfffffff0
... ... ... ... ... ... ...
What is 0x0804844d and 0x0804844e and 0x08048450? It is not affected by ASLR. Is it still a memory address, or a relative point to the file?
If you look at the Intel Developer Manual instruction-set reference you can see that 0x0804846d <+32>: eb 15 jmp 0x8048484 encodes a relative address. i.e. it's the jmp rel8 short encoding. This works even in position-independent code, i.e. code which can run when mapped / loaded at any address.
ASLR means that the address of the stack (and optionally code+data) in the executable can change every time you load the file into memory. Obviously, once the program is loaded, the addresses won't change anymore, until it is loaded again. So if you know the address at runtime, you can target it, but you can't write an exploit assuming a fixed address.
GDB is showing you addresses of code in the virtual-memory space of your process, after any ASLR. (BTW, GDB disables ASLR by default: set disable-randomization on|off to toggle.)
For executables, it's common that only the stack pointer is ASLRed, while the code is position-dependent and loaded at a fixed address, so code and static data addresses are link-time constants, so code like push OFFSET .LC0 / call puts can work, hard-coding the address of the string constant into a push imm32.
Libraries usually need to be position-independent anyway, so ASLR can load them at a randomized address.
But ASLR for executables is possible and becoming more common, either by making position-independent executables (Linux), or by having the OS fix-up every hard-coded address when it loads the executable at a different address than it was compiled for (Windows).
Addresses only have a 1:1 relation to the position within the file only in a relative sense within the same segment. i.e. the next byte of code is the next byte of the file. The headers of the executable describe which regions of the file are what (and where they should be mapped by the OS's program loader).
The meaning of the addresses shown differs in three cases:
For executable files
For DLLs (Windows) or shared objects (.so, Linux and Un*x-like)
For object files
For executables:
Executable files typically cannot be loaded to any address in memory. In Windows there is the possibility to add a "relocation table" to an executable file (required for very old Windows versions); if this is not present (typically the case when using GCC) then it is not possible to load the file to another memory location. In Linux it is never possible to load the executable to another location.
You may try something like this:
static int a;
printf("%X\n", &a);
When you execute the program 100 times you see that the address of a is always the same so no ASLR is done for the executable file itself.
The addresses dumped by objdump are absolute addresses.
For DLLs / .so files:
The addresses are relative to the base address of the DLL (under Linux) or they are absolute addresses (under Windows) that will change when the DLL is loaded into another memory area.
For object files:
When dumping an object file the addresses are relative to the currently displayed section. If there are multiple ".text" sections in a file the addresses will start at 0 for each section.
We have a Delphi COM component being called from an ISAPI web app. The COM component is hanging the app because it is trying to display a MessageBox(). We have no MessageBox() call in our user code so it must be located in the Delphi runtime source, probably in exception handler code.
We have an IIS debug diagnostics report that shows our module name + an offset address as the offending code.
We have a .MAP file for our module and we also have produced a .dbg file using MAP2DBG.
Our question is how do we locate the source file line of code using the IIS debug diag hang report containing the offset address, using the .MAP or .DBG file?
We've tried to use WinDbg but have not been able to figure out what we need to do to locate the source line.
Firstly you need to find the base address where the process loaded the COM module in to the IIS process - this may be evident by the IIS Debug log. Lets call this BASE.
Then you calculate the MAPoffset = offset - BASE - $1000 and you have an address that can be searched for in the Delphi MAP file.
In the MAP file (which should be detailed to get line number mapping) you will find a section for each source module which list records of "linenumber segment:offset". Then you check if MAPoffset is either equal to an offset or in between two of the line number offsets. This should give you a direction to which line is the offending line.
The segment is usually 1 - indicating a text segment with generated code - (there is a segment map in the top of the MAP file).
Hope this helps!