About program export variables - c++builder

The graphics card manufacturer has an optimization scheme. The following variables are exported from the program, and the program will be executed with an independent graphics card. For the program compiled by the new version of the bcc compiler, the exported variables are prefixed with an underscore, and the -vu parameter is not supported. I don't know how to solve this.
// http://developer.download.nvidia.com/devzone/devcenter/gamegraphics/files/OptimusRenderingPolicies.pdf
// The following line is to favor the high performance NVIDIA GPU if there are multiple GPUs
// Has to be .exe module to be correctly detected.
extern "C" __declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
// And the AMD equivalent
// Also has to be .exe module to be correctly detected.
extern "C" __declspec(dllexport) int AmdPowerXpressRequestHighPerformance = 0x00000001;
bcc32 old version
bcc32c new version

Related

[LLVM][RISCV] Easy way to promote byte loads/stores to word/dword loads/stores?

I'm trying to hack the Clang/LLVM compiler to do work around a specific RISC-V hardware issue on the processor I am working on.
It appears that loading addresses that are not 16/32-bit aligned does not work as expected. To work around this, I'd like to modify the compiler to change all lb/lbu and sb instructions to instead work on a full word or dword.
For example, I have the following struct:
union Flags
{
struct
{
uint32_t enabled : 1;
uint32_t reserved : 7;
uint32_t foo : 1;
uint32_t reserved2 : 23;
};
uint32_t u32All;
};
Right now, when the compiled program tries to access a pointer containing Flags->foo, it does so with an lbu instruction that is at address X+1(byte). This offset is truncated by the hardware and so it reads the address X instead, which would return the enabled flag instead of the foo flag.
I need to promote the load into a word/dword and then operate on it properly.
I'm not knowledgeable about LLVM in the slightest. I've tried poking around RISCVISelLowering.cpp, but the only thing I've managed to do is cause lld to crash when it attempts to legalize the load ops.
Looking for any help or guidance you might be able to provide. Thank you!

How do programming languages without raw memory access do things like reading files?

Programming languages like C can, as far as I know, execute system calls to make the OS give them direct memory access to file streams that can be read from/written to. Now, how do programming languages without raw memory access (Java, Python, etc.) even do something like open a file "under the hood"?
Obviously, I'm not just talking about opening files here - some languages have built-in file reading features that would make my question obsolete. This concerns anything that has anything to do with direct memory access - e.g. accessing other devices (for instance the keyboard, the mouse), and so on.
Here's an excerpt of the source code of the FileInputStream class from OpenJDK 8 (link):
/**
* Opens the specified file for reading.
* #param name the name of the file
*/
private native void open0(String name) throws FileNotFoundException;
// ...
private native int read0() throws IOException;
/**
* Reads a subarray as a sequence of bytes.
* #param b the data to be written
* #param off the start offset in the data
* #param len the number of bytes that are written
* #exception IOException If an I/O error has occurred.
*/
private native int readBytes(byte b[], int off, int len) throws IOException;
The native keyword (see this other Q&A) means that these methods are not implemented in the Java source code for this class; they are provided by the implementation of the Java interpreter which executes the program (e.g. the java command-line utility, which executes Java bytecode). The Java interpreter itself is ultimately written in a language like C, which has the low-level features required to actually implement these native methods. When the interpreter has to execute a method marked native, it invokes the corresponding function written in C.
Similarly, other high-level languages like Python have some functions which are implemented "natively" in the same sense. These functions, along with the behaviours of basic arithmetic and comparison operations, are "intrinsic" as compared to the large bulk of a language's standard library, which is usually written in the language itself.

Syntax/Functions used in the OpenCL-Implemantation of OpenCV

I try to understand the use of OpenCL within OpenCV but I don´t get it:
This is an example Codepart from orb.cpp where a Kernel with the name ORB_HarrisResponses located in orb.cl is created (propably):
ocl::Kernel hr_ker("ORB_HarrisResponses", ocl::features2d::orb_oclsrc,
format("-D ORB_RESPONSES -D blockSize=%d -D scale_sq_sq=%.12ef -
D HARRIS_K=%.12ff", blockSize, scale_sq_sq, harris_k));
return hr_ker.args(ocl::KernelArg::ReadOnlyNoSize(imgbuf),
ocl::KernelArg::PtrReadOnly(layerinfo),
ocl::KernelArg::PtrReadOnly(keypoints),
ocl::KernelArg::PtrWriteOnly(responses),
nkeypoints).run(1, globalSize, 0, true);
But this isn't the regular OpenCL-Syntax (functions like clCreateKernel ...). Does someone know where I can get a basic understanding of the OpenCV`s OpenCL implementations to answer questions like:
Where is the connection between the "normal" OpenCL and the OpenCV OpenCL?
Where the program is built from the kernel source files?
Where is the function, which creates the kernel explained?
etc
I couldn´t find a document or related questions on the web.
Thanks
Edit: Thanks for answering it helped to understand a few things:
ocl::Kernel hr_ker("ORB_HarrisResponses", ocl::features2d::orb_oclsrc,
format("-D ORB_RESPONSES -D blockSize=%d -D scale_sq_sq=%.12ef -D HARRIS_K=%.12ff", blockSize, scale_sq_sq, harris_k));
In this part the kernel code ORB_HarrisResponses located in orb.cl build within the string ocl::features2d::orb_oclsrc is created as hr_ker (right?).
But what does the format(...) thing do?
if hr_ker.empty() return false;
return hr_ker.args(ocl::KernelArg::ReadOnlyNoSize(imgbuf),
ocl::KernelArg::PtrReadOnly(layerinfo),
ocl::KernelArg::PtrReadOnly(keypoints),
ocl::KernelArg::PtrWriteOnly(responses),
nkeypoints).run(1, globalSize, 0, true);
In this part of the Kernel arguments imgbuf, layerinfo, keypoints are set and output of the kernel is stored in responses.
What is going on with nkeypoints?
Why no ocl::KernelArg infront of this parameter?
The kernel in orb.cl has 7 arguments but only 5 are set, why?
What exactly is returned from return hr_ker.args(...)?
This syntax is kind of internal OpenCV "sugar" to not repeat some common code blocks. Unfortunately there is no good documentation so the only way to learn it is looking through source code and examples.
Some tips for you:
Connection between OpenCL API and opencv are in modules\core\src\ocl.cpp (see Kernel, Kernel::Impl, Program, ProgramSource, KernelArg classes).
Source code of kernels stored in *.cl files (for example ORB kernels are in modules\features2d\src\opencl\orb.cl file). On module building code of kernels are copying to auto-generated cpp file (for example opencl_kernels_features2d.cpp) and code can be accessed by ocl::features2d::orb_oclsrc.
To use opencl implementation in opencv you need to pass to function cv::UMat instead of regular cv::Mat (see CV_OCL_RUN_ macro and cv::OutputArray::isUMat() method).
Basically all opencl implementation inside opencv does the following:
Defines kernel parameters, like global size, block size, etc.
Creates cv::ocl::Kernel using string with source code and defined parameters. (If kernel is not created or there is no opencl implementation for specified input parameters processing is passed to regular cpu code).
Pass kernel arguments via cv::ocl::KernelArgs. There is several types of parameters to optimize processing: read-only, write-only, constant, etc.
Run kernel.
So for end user using opencl implementation is transparent. If something goes wrong processing is switched to cpu implementation.
Let's discuss following code snippet:
return hr_ker.args(ocl::KernelArg::ReadOnlyNoSize(imgbuf),
ocl::KernelArg::PtrReadOnly(layerinfo),
ocl::KernelArg::PtrReadOnly(keypoints),
ocl::KernelArg::PtrWriteOnly(responses),
nkeypoints).run(1, globalSize, 0, true);
and ocl function declaration:
ORB_HarrisResponses(__global const uchar* imgbuf, int imgstep, int imgoffset0,
__global const int* layerinfo, __global const int* keypoints,
__global float* responses, int nkeypoints )
nkeypoints is integer, so no need to wrap it to ocl::KernelArg. It will be passed directly to kernel.
ocl::KernelArg::ReadOnlyNoSize actually expands to three parameters: imgbuf, imgstep, imgoffset0.
Other kernel arguments doesn't expand, so it represent single parameter.
hr_ker.args returns reference to cv::ocl::Kernel so you may use following construction: kernel.args(...).run(...).
Some useful links:
cv::format documentation. It works like boost::format.
Hope it will help.

memory trace of all variables in program with DBI tool

I am using intel pin as my primary DBI tool.
I am interested to know how can I trace all variables allocated in a program .
suppose, we have the following snippet in C:
int *ptr_one, *ptr_two, g;
ptr_one = (int *)malloc(sizeof(int));
ptr_two = (int *)malloc(sizeof(int));
*ptr_one = 25;
*ptr_two = 24;
g = 130;
free(ptr_two);
g = 210;
*ptr_two = 50;
I want to know how can I trace specific variables / memory references in my program . for example on the above code, I like to trace the variable "g" in my program with Intel Pin, how it can be done?
for dynamically allocated variables, I'm monitoring malloc/free calls & follow their addresses, but for static ones I do not have any idea .
Another matter is, for dynamically allocated variables, I like to trace them across the whole program, suppose in the above code, I want to monitor (ptr_two) variable changes and modification during my program from start to finish .
If anyone have some idea about that, it can be nice to share it here, sample codes appreciated in Intel Pin .
thank you all .
Simply stated, you can't associate a name from your source code (be it variable or function name) with a memory location on the compiled binary: this information is (probably) lost on the final binary.
This is not true in two cases:
1) If your binary is exporting functions: in this case other binaries must have a means to call the function by name (minus some subtleties), in which case the information must be available somewhere; for example on Windows, binaries that export functions, variables or classes have an export table.
2) You have symbolic information: in your example, either for the global variable or other local variable, you have to use the symbolic information provided by the compiler.
On Linux you will need an external tool / library / program (e.g. libelf.so or libdwarf.so) to parse the symbolic information from the symbol tables (usually dynsym / symtab) if the binary is not stripped.
On windows you have to rely on the program database (*.pdb files); the format is mostly undocumented (although MS is trying to document it) and you have to use either the DbgHelp API or the DIA SDK.
As stated by the PIN user guide (emphasis is mine):
Pin provides access to function names using the symbol object (SYM).
Symbol objects only provide information about the function symbols in
the application. Information about other types of symbols (e.g. data
symbols), must be obtained independently by the tool.
If you have symbolic information you can then associate a variable name - obtained from an external tool - with an address (relative to the module base for global vars or a stack location for local ones). At runtime it is then just a matter of converting the relative address to a virtual one.

How can I force my Program to use the strongest Graphic card [duplicate]

I want my application to always run using the real gpu on nVidia Optimus laptops.
From "Enabling High Performance Graphics Rendering on Optimus Systems", (http://developer.download.nvidia.com/devzone/devcenter/gamegraphics/files/OptimusRenderingPolicies.pdf):
Global Variable NvOptimusEnablement (new in Driver Release 302)
Starting with the Release 302 drivers, application developers can
direct the Optimus driver at runtime to use the High Performance
Graphics to render any application–even those applications for which
there is no existing application profile. They can do this by
exporting a global variable named NvOptimusEnablement. The Optimus
driver looks for the existence and value of the export. Only the LSB
of the DWORD matters at this time. A value of 0x00000001 indicates
that rendering should be performed using High Performance Graphics. A
value of 0x00000000 indicates that this method should be ignored.
Example Usage:
extern "C" { _declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001; }
The problem is that I want to do this using Delphi. From what I've read Delphi does not support export of variables even though some hacks exists. I did try a few of them but couldn't make it work.
In the same nvidia document I read that forcing the proper GPU can be accomplished via linking statically to one of a handful listed dlls. But I don't want to link to dlls I'm not using. (Why the opengl.dll is not one of them is beyond me.) A simple exported variable seems much cleaner.
From what I've read Delphi does not support export of variables.
That statement is incorrect. Here's the simplest example that shows how to export a global variable from a Delphi DLL:
library GlobalVarExport;
uses
Windows;
var
NvOptimusEnablement: DWORD;
exports
NvOptimusEnablement;
begin
NvOptimusEnablement := 1;
end.
I think your problem is that you wrote it like this:
library GlobalVarExport;
uses
Windows;
var
NvOptimusEnablement: DWORD=1;
exports
NvOptimusEnablement;
begin
end.
And that fails to compile with this error:
E2276 Identifier 'NvOptimusEnablement' cannot be exported
I don't understand why the compiler doesn't like the second version. It's probably a bug. But the workaround in the first version is just fine.
I'm not a Delphi expert, but AFAIK it is possible to link to static libraries implemented in C from Delphi. So I'd simply create a small stub library, just providing this export, which is statically linked into your Delphi program. This adds the very export you need.

Resources