setting up OpenCL with DirectX 9 - directx

I'm completely new to OpenCL and GPU programming in general. Right now I am working on a project where I'm trying to see the performance saves that making use of the GPU in a game has. With this, however, I have ran into a snag; how do I set up my Directx project to speak to the OpenCL code base?
I've been googling this for about a week and haven't been able to find anything. If someone could point me in the right direction, I would be greatful.

OpenCL does not have anything to do with DirectX, it's simply another library.
For OpenCL you'll need an implementation ('SDK'), as Khronos don't provide those (they only provide the specifications).
Intel, AMD and Nvidia all provide one, but they have different requirements and limitations. See here for some of the existing implementations
After installing one of these, you'll have the necessary headers and libraries to code against the OpenCL API and link with OpenCL.dll
There are lots of sample sources in the SDKs or online, you have to write the kernel, the rest is mostly boilerplate code for initialization and kernel compilation.

The specific OpenCL extension that allows sharing of OpenCL buffers as textures and vice versa is cl_khr_d3d10_sharing.txt. http://www.khronos.org/registry/cl/extensions/khr/cl_khr_d3d10_sharing.txt

OpenCL has extensions for sharing memory between DirectX and OpenCL (and also between OpenGL and OpenCL.) This allows you to read or write DirectX buffers, including textures from within OpenCL. Ani's answer mentioned the extension for DirectX 10, but since the question is about DirectX 9, the extension you'll actually be using is cl_khr_dx9_media_sharing.
This extension has just 4 functions:
clGetDeviceIDsFromDX9MediaAdapterKHR
This function allows you to get the OpenCL device IDs of the OpenCL device(s) that can share memory with a given Direct3D 9 device.
clCreateFromDX9MediaSurfaceKHR
This function gets an OpenCL cl_mem memory object for a given Direct3D 9 memory object.
clEnqueueAcquireDX9MediaSurfacesKHR
This function locks the specified shared memory object so that you can read and/or write to it from OpenCL.
clEnqueueReleaseDX9MediaSurfacesKHR
This function unlocks the specified memory object from OpenCL, so that Direct3D can read/write it again.
Once you've used the above functions to share and synchronize access to the memory buffers, everything else on both the Direct3D 9 side and the OpenCL side works as it would otherwise with those particular APIs.
Note that your GPU will need to support the cl_khr_dx9_media_sharing extension in order for this to work. You can check the extensions property of the OpenCL platform and device in order to confirm that this extension is supported.
Some NVidia GPUs support a different extension instead, called cl_nv_d3d9_sharing. The basic idea of how it works is the same as with the cl_khr_dx9_media_sharing extension, but the exact details are a bit different. The biggest difference is just that it has different functions for getting cl_mem objects for different types of Direct3D 9 buffers, rather than just one function to cover all of them.

Related

Opencl migration internals

I am interested in how OpenCL memory transferring functions operate underneath (migration, reading/writing the buffer, mapping/unmapping). I could not find any open source implementation for OpenCL (for me Intel's one could be fine) and just explanations in the documentation don't give me any idea what is happening, for example, when I call clEnqueueMigrateMemObjects: what calls happen during this migration, what modules are active, how this migration happens, what mechanisms it uses underneath, does it use some cache mechanisms.
Is there a good source to read about it?
I am now exploring how OpenCL passes data to FPGAs. Xilinx currently uses native OpenCL implementation, present on a machine, plus some extensions.
If you're looking for low-level information (how a particular implementation implements those calls), probably the only source is the implementation.
There are a few opensource OpenCL on GPU implementations:
Raspberry Pi 3 (beta): https://github.com/doe300/VC4CL
OpenCL on Vulkan (beta): https://github.com/kpet/clvk
Mesa Clover (supports only 1.1): https://cgit.freedesktop.org/mesa/mesa/log/?qt=grep&q=clover
AMD ROCm: https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime
Intel sources of NEO (their new OpenCL implementation) here: https://github.com/intel/compute-runtime
I'm not aware of Xilinx providing sources for their implementation, so if you want to know what exactly happens on Xilinx, your best chance is probably to ask on Xilinx forums or via some official support.

General GPU programming on iPhone [duplicate]

With the push towards multimedia enabled mobile devices this seems like a logical way to boost performance on these platforms, while keeping general purpose software power efficient. I've been interested in the IPad hardware as a developement platform for UI and data display / entry usage. But am curious of how much processing capability the device itself is capable of. OpenCL would make it a JUICY hardware platform to develop on, even though the licensing seems like it kinda stinks.
OpenCL is not yet part of iOS.
However, the newer iPhones, iPod touches, and the iPad all have GPUs that support OpenGL ES 2.0. 2.0 lets you create your own programmable shaders to run on the GPU, which would let you do high-performance parallel calculations. While not as elegant as OpenCL, you might be able to solve many of the same problems.
Additionally, iOS 4.0 brought with it the Accelerate framework which gives you access to many common vector-based operations for high-performance computing on the CPU. See Session 202 - The Accelerate framework for iPhone OS in the WWDC 2010 videos for more on this.
Caution! This question is ranked as 2nd result by google. However most answers here (including mine) are out-of-date. People interested in OpenCL on iOS should visit more update-to-date entries like this -- https://stackoverflow.com/a/18847804/443016.
http://www.macrumors.com/2011/01/14/ios-4-3-beta-hints-at-opencl-capable-sgx543-gpu-in-future-devices/
iPad2's GPU, PowerVR SGX543 is capable of OpenCL.
Let's wait and see which iOS release will bring OpenCL APIs to us.:)
Following from nacho4d:
There is indeed an OpenCL.framework in iOS5s private frameworks directory, so I would suppose iOS6 is the one to watch for OpenCL.
Actually, I've seen it in OpenGL-related crash logs for my iPad 1, although that could just be CPU (implementing parts of the graphics stack perhaps, like on OSX).
You can compile and run OpenCL code on iOS using the private OpenCL framework, but you probably don't get a project into the App Store (Apple doesn't want you to use private frameworks).
Here is how to do it:
https://github.com/linusyang/opencl-test-ios
OpenCL ? No yet.
A good way of guessing next Public Frameworks in iOSs is by looking at Private Frameworks Directory.
If you see there what you are looking for, then there are chances.
If not, then wait for the next release and look again in the Private stuff.
I guess CoreImage is coming first because OpenCL is too low level ;)
Anyway, this is just a guess

How do I detect the DirectX shader model above v3 supported by a graphics card?

I am writing a small utility that reports system capabilities. One is the highest shader model supported by the installed graphics card, and I am currently detecting this using Direct3D 9.0c's device capabilities and checking the VertexShaderVersion and PixelShaderVersion fields of the D3DCAPS9 structure.
HRESULT hrDCaps = poD3D9->GetDeviceCaps(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, &oCaps);
if (!FAILED(hrDCaps)) {
// Pixel and vertex shader model versions. Use the minimum number of each for "the" shader model version
const int iVertexShaderModel = D3DSHADER_VERSION_MAJOR(oCaps.VertexShaderVersion);
const int iPixelShaderModel = D3DSHADER_VERSION_MAJOR(oCaps.PixelShaderVersion);
However, both these values return shader model 3 even for cards that support higher models. Here is what GPU-Z returns for the same card, for example:
This question indicates that DX9 will never report more than SM3 even on cards that support a higher model, but doesn't actually mention how to solve it.
How do I accurately get the shader model supported by the installed card? That is, the card capabilities, not the installed DirectX driver capabilities.
The utility has to run on Windows 2000 and above, and work on systems where a graphics card and even DirectX are not installed. I am currently dynamically loading DX9, so on those systems the check gracefully fails (which is ok.) But I am seeking a similar solution: something that will still run on all systems, and work correctly (detect the SM version) on most systems.
Edit - purpose: I am not using this code to dynamically change features of a program, ie select shaders. I am using it to report hardware capabilities as a 'ping' to a server, which is used to we have a good idea of typical hardware that our customers use, which can inform future product decisions. (For example: how many customers have SM4 or above? How many are using a 64-bit OS? Etc.) This is why either (a) gracefully failing, so we know it failed, or (b) getting an accurate shader model number are the two preferred modes.
Edit - answers so far: The answer below by SigTerm suggests instantiating DirectX 11, 10.1, 10, and 9.0c in order, and basing the reported shader model on which version instantiated without failures (shader model 5, 4.1, 4, and DXCAPS in that order.) If possible, I'd appreciate a code example of the DX11 and 10 ways to do this.
This may not be a reliable solution. For example, I am running Windows on a VMWare Fusion virtual machine on OSX. The Fusion drivers report DX11 in DxDiag, yet I know from the Fusion tech specs that it only supports DX9.0c and shader model 3. Still, with this exception, this method seems the best way so far.
version 4 is only supported on Direct3D10. Therefore, D3D9 api won't report it. Use D3D10/D3D11 api to detect higher version.
something that will still run on all systems, and work correctly (detect the SM version) on most systems.
Attempt to initialize D3D10/D3D11 to check functionality, if it fails init D3D9. Use LoadLibrary + GetProcAddress to load D3D10 functions, because if you link with D3D10 using .lib file, your application will fail to start if d3d10 is missing.
Or use OpenGL and try to map capabilities reported by OpenGL to D3D capabilities (probably a very bad idea).
Or build GPU database and use that.
where a graphics card and even DirectX are not installed.
I think you're asking for the impossible, because shaders are provided by DirectX, and the driver/GPU might not even have a concept of a "shader model" under the hood. In this case the only way to detect capabilites will be to make GPU database of some sort, detect installed devices, and return answer from database. This won't be relabile, of course.
Here is a link about DirectX versions and supported shader models.

What's the difference between AMD's APP SDK and (AMD) ATI's Stream Technology?

I'm working on a project that will use an AMD GPU for processing data. I noticed AMD has two different SDKs available on their website for using the GPU: ATI Stream Technology and
OpenCLâ„¢ and the AMD APP SDK. It looks like both support OpenCL but I haven't found anything on the site explicitly pointing out why one would use one over the other. What's the difference between these two?
The AMD APP SDK is here: http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx
The website should also answer your question about the difference between Stream and APP:
AMD Accelerated Parallel Processing (APP) SDK (formerly ATI Stream)
It used to be called AMD Stream SDK, they probably renamed it after adding support for non-Firestream hardware (namely OpenCL)
stream is the higher level amd-specific project (hardware and software) that includes opencl as the current software implementation. stream originally used the "brook" language, but switched to opencl in 2011. since then opencl became more popular (because it is a cross-platform standard that has been particularly well supported by apple) and these days amd doesn't seem to mention stream much. you can see this in a link like http://www.amd.com/us/products/technologies/stream-technology/opencl/pages/opencl.aspx where opencl is a "child" of stream (or the menu on the left of that page, where the higher level group is stream; other children are related to hardware).
in short, you want opencl. and despite the confusing mess that is amd's site, their opencl implementation is pretty solid.
hmmm. re-reading your question you seem to say there are two separate sdks. do you actually drill down to two different packages? my understanding is that opencl is the stream sdk. if you have found two different sdks (that are both current) can you link to them?

What language and compiler to write ATI GPU code for?

I know Nvidia has CUDA, but what does ATI have? I dont want to use OpenCL because I want to keep as low level to the hardware as possible.
Is it brook, or stream?
The documentation available is pretty pathetic! CUDA seems easy to get programming, but I want to use ATI specifically because of their hardware.
OpenCL is AMD's currently preferred GPU/compute language.
Brook is deprecated.
However, you can write code at a very low level, using AMD's
shader and kernel analyzer
http://developer.amd.com/tools/shader/Pages/default.aspx.
http://developer.amd.com/tools/AMDAPPKernelAnalyzer/Pages/default.aspx
E.g. http://developer.amd.com/tools/shader/PublishingImages/GSA.png
shows OpenCL code, and the Radeon 5870 assembly produced.
You can actually code directly in several forms of "assembly".
Or at least you could - the webpages no longer mention this.
(I used to have this installed for tuning and testing, but do not at the moment.)
More usually, you can code in any of several forms of AMD IL, Intermediate Language,
which is closer to the machine than OpenCL. The kernel analyzer web page says
"If your kernel is an IL kernel Stream, KernelAnalyzer will automatically compile the IL..."
I would recommend that you use OpenCL, and then look at the disassembly and tweak the OpenCL code to be better tuned. But you can work in IL, and probably still can work at an even lower level.

Resources