Trying to mix in OpenCL with CUDA in NVIDIA's SDK template - sdk

I have been having a tough time setting up an experiment where I allocate memory with CUDA on the device, take that pointer to memory on the device, use it in OpenCL, and return the results. I want to see if this is possible. I had a tough time getting a CUDA project to work so I just used Nvidia's template project in their SDK. In the makefile I added -lOpenCL to the libs section of the common.mk. Everything is fine when I do that, but when I add #include <CL/cl.h> to template.cu so I can start making OpenCL calls, I get over a 100 errors. They all look similar to this, but with different function names at the end:
/usr/lib/gcc/x86_64-linux-gnu/4.4.1/include/xmmintrin.h(334): error:
identifier "__builtin_ia32_cmpeqps" is undefined
I am having a hard time figuring out why. Please help if you can. Also, if there is an easier way to set up a project that'll be able to call the CUDA and OpenCL APIs let me know.

I haven't really worked with cuda, so I don't know how helpful my answer is.
From what I understand you are trying to use opencl directly from your cuda hostcode, which is if I remember correctly compiled using some compiler from nvidia instead the standard gcc. So the problem is probably that this compiler doesn't implement the necessary builtins to work with the mentioned headers.
Look here for a similar problem and it's solution:
http://forums.nvidia.com/lofiversion/index.php?t88573.html
It seems you have to put everything which needs the opencl api into a different (non cuda) compilation unit so that it will be compiled by the non nvidia compiler.
However I wouldn't count on this working (since opencl buffers aren't just pointers to the memory but should contain some metainformations to), simply because there is no real reason it should work and if it does there is no guarantee that it continues to do so.
What you could try if you really want to is using opengl for the interop, since both opencl and cuda have extensions to allow creating buffers from opengl buffers.
However why do you need to do this? Whats keeping you from using Apple's implementation shortterm, since IIRC it's open source and most of it (the opencl parts) should be platform independent anyways.

Related

Armadillo Calls Internal Accelerate Libraries on iOS

I recently tried to use Armadillo on iOS to do some matrix computing. The App worked on my development iPhone, but Apple gave me the error message when trying to publish it in the Appstore. It seems that Armadillo calls some BLAS functions which are internal. I searched the web with the message, but had not found anything useful. I also found calling BLAS functions with "cblas_" prefix, e.g. cblas_dgemv, directly from my code would not cause the error. However, that made the use of armadillo meaningless.
I wonder if anyone has encountered the same problem, and what the solution is. I’m suspecting it’s something related to some macro in config.hpp. Thank you very much for your kindly help.
Error message:
Non-public API usage:
• The app references non-public symbols in ***: _sgemm_, _sgemv_, _ssyrk_
The Accelerate BLAS implementation supports a bunch of redundant symbols to facilitate divergent function naming schemes of various fortran compilers. Strictly speaking these are intended to be used (by your fortran compiler) so you probably have some arguing ground that they are not private interfaces. If the AppStore is still giving you trouble, file a bug against Apple and ask them to fix the bookkeeping on the interfaces so they can be used.
It would be simpler to just start using the cblas_ interfaces in the headers though.

what is the equivalent of gcc's __attribute__((constructor)) in clang?

I have just finished porting a decent amount of c-sources to the iOS platform and packaged them as a universal static framework. I, then, added the framework (not the project) to a sample iOS app in order to test linkage and proper function. That's when I ran into a humbling problem.
In my attempt to solve the problem described here, I also came across some symbols that are composed through the heavy use of macros (i HATE those). Some of those macros use function attributes that are really extensions of gcc rather than of standard C.
Of course I can always add -std=gnu89, but even then, I am not sure it will resolve the original problem of undefined symbols in the static library.
Not only that, I am now worried that my port to iOS of those sources may not be an accurate port and may result in the type of bugs/issues that maybe related to compiler's codeine and/or optimization policies.
If you can share some of your experience/advice in how best to go about that port, I would really appreciate it.
Thanks!
From manual testing with clang 8.0, it seems that both __attribute__((constructor)) and __attribute__((__constructor__)) work for your purpose.

OS X: convert .dylib to .a/.o (dynamic to static)?

Suppose I've read this caveat, and I still want to use TBB as a statically-linked library. (Pretend I'm working in an environment where users aren't allowed to create their own dylibs.) But I don't really want to rewrite the TBB makefile to generate libtbb.a instead of libtbb.dylib.
Is there a simple command-line way to convert libtbb.dylib into libtbb.o with the same entry points?
I have heard a good argument for not being able to go the other way, from static to dynamic. Namely: dynamic libraries need to be PIC, and converting a non-PIC static library to PIC isn't feasible. But that argument doesn't apply in the other direction, as far as I know.
Here's someone saying it's impossible to convert .dll to .a on Windows, but I think they're just talking about the impossibility of breaking a .dll or .exe back up into its original .o files, not necessarily saying it would be impossible to create a linkable .o file with the same contents. Also, the situation on Windows is slightly odder than "real" PIC, although I don't think that's relevant.
Intel Threading Building Blocks (TBB) is available as binary for Windows, Mac and Linux. If you expect to use libtbb.dylib from the Mac distribution on iOS then you are out of luck. The Mac distribution is targeted for Intel (32 and 64 bits). Since iOS runs on ARM processors, you could not use it, even if you found a way to convert a dynamic library to a static library.
If you found a libtbb.dylib file somewhere else targeted for ARM, then you could probably use it on iOS. It's actually possible to load dynamic libraries on iOS. Have a look at the dlopen(3) man page.
Finally, you should read about Grand Central Dispatch (GCD) instead, which is built-in support for concurrent code execution on multicore hardware in iOS and OS X.

AIX dynamic linking

I'm working on porting a library onto AIX. It works on Solaris, Windows and Linux but AIX is giving me headaches. I'm at a point where it builds and runs but I have an issue with some of the libraries it's linking in. Ideally I want to be able to ship a library that just requires the c runtime to be available with no other dependencies. At the moment I'm having a problem with libpthread which I can see is a symlink to an AIX specific threading library.
My issue is this:
If I don't link pthread (I don't seem to need to on Solaris for the same code base) then I get undefined symbols. That's fine I am using pthreads. If I link it in then it works fine, except that any calling application also has to link to pthreads. I don't really understand is why does my calling app, which has no dependency on pthread, need to link against it just because it's calling a library which links to the shared object?
I'm on AIX 6.1 using gcc 4.2.4.
I'd be OK with shipping a library that requires pthreads to be present on the library path (ideally we'd get a static version) but I'm a bit unhappy about shipping a library that places linker rqeuirements on the client.
Any ideas on what I might be doing wrong?
I defeinitely seem to be going in circles. I removed the -shared flag on the linker to resolve an earlier problem and that, of course, makes the library static. So the behaviour is just normal behaviour in that if you depend on a dynamic library from a static one you have to link both into your app. So I've put the shared flag back and now half of my functions are no longer accessible. It does explain the problem I was seeing though.

Native code execution by JVM/CLR

How does JVM/CLR execute JIT compiled native code? Is it by some code injection or by copying code to executable memory? What are the system calls that allows dynamic code execution?
I can explain how we do it in CACAO VM (a research JIT-only JVM). First, the machine code for a method is generated into some heap-allocated memory block. After compilation, the final code length is known, and a chunk of executable memory is allocated using mmap and the PROT_EXEC flag (relevant CACAO code here). Then, the machine code is copied into the mmapped area. After that, many architectures require some machine-specific cache flushing mechanism. As an example, have a look at the cache-flushing function for PowerPC 64. Notably, on i386 and x86_64, there is nothing to do. After this step, the processor is ready to execute the newly-generated code. Alternatively, already allocated memory pages can be marked executable with mprotect. Note that mmap/mprotect are Unix facilities.
I don't know specifically how Java does it, but in general you'd insert "trap" opcodes into the interpreter's instruction stream. There are two opcodes in the JVM spec that seem tailor-made for this purpose.
If you want to know for sure, there's no better answer than the source: http://download.java.net/jdk6/source/
The Common Language Runtime has a methodtable for each type with entries pointing to native code or a native stub to JIT managed code and then fixup the methodtable with the pointer to the just created native code.
MSDN has a more in depth explanation in the MethodDesc section
This blog entry by Dave Notario explains how the CLR JIT compiler works.

Resources