Mechanism like CUDA streams in Xeon Phi? - stream

I am new to work with Xeon Phi Co-processor and my question is:
Does exists a mechanism like CUDA streams in Xeon Phi ???

That's right, hStreams essentially covers the key features of CUDA Streams and OpenCL, in that several CUDA Streams and OpenCL apps have been ported to hStreams. Users of hStreams, like the OmpSs folks at Barcelona Supercomputing assessed that hStreams was easier to use than CUDA Streams, and offered better support for synchronization, required fewer unique APIs, and fewer lines of code.
For some more documentation, please see http://lotsofcores.com/hStreams, which you can also find a link of where to download MPSS and a blog that offers a few highlights of its features, including hStreams.
Once you've installed hStreams, look in /usr/share/doc/hStreams.

Yes. The Intel Manycore Platform Software Stack (MPSS) provides hStreams, which are designed to be similar to the CUDA streams model.
There is a chapter in High Performance Parallel Programming Pearls II on hStreams, which you can preview in Google Books.
I can't find any detailed documentation on Intel's website, but the release notes say that you can find PDFs in the MPSS distribution, which should be on any Intel Xeon Phi coprocessor system.
BSC has detailed documentation of hStreams here.

Related

Opencl migration internals

I am interested in how OpenCL memory transferring functions operate underneath (migration, reading/writing the buffer, mapping/unmapping). I could not find any open source implementation for OpenCL (for me Intel's one could be fine) and just explanations in the documentation don't give me any idea what is happening, for example, when I call clEnqueueMigrateMemObjects: what calls happen during this migration, what modules are active, how this migration happens, what mechanisms it uses underneath, does it use some cache mechanisms.
Is there a good source to read about it?
I am now exploring how OpenCL passes data to FPGAs. Xilinx currently uses native OpenCL implementation, present on a machine, plus some extensions.
If you're looking for low-level information (how a particular implementation implements those calls), probably the only source is the implementation.
There are a few opensource OpenCL on GPU implementations:
Raspberry Pi 3 (beta): https://github.com/doe300/VC4CL
OpenCL on Vulkan (beta): https://github.com/kpet/clvk
Mesa Clover (supports only 1.1): https://cgit.freedesktop.org/mesa/mesa/log/?qt=grep&q=clover
AMD ROCm: https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime
Intel sources of NEO (their new OpenCL implementation) here: https://github.com/intel/compute-runtime
I'm not aware of Xilinx providing sources for their implementation, so if you want to know what exactly happens on Xilinx, your best chance is probably to ask on Xilinx forums or via some official support.

Intel i5 memory consistency model?

How to check which memory consistency model does Intel i5 have? I have been searching for Macs and Intel, and it seems impossible to find. Any tips on how to search for this information?
Memory ordering rules for different Intel processors are now described in the Intel SDM, volume 3A, chapter 8, section 8.2 "Memory Ordering". There used to be an official whitepaper on the subject, now only available from non-oficial sources.
Note that information published in different revisions of the SDM from 2006 and later had been changing. An overview of what was stated by x86 memory ordering independently by Intel and AMD can be found here.

What's the difference between AMD's APP SDK and (AMD) ATI's Stream Technology?

I'm working on a project that will use an AMD GPU for processing data. I noticed AMD has two different SDKs available on their website for using the GPU: ATI Stream Technology and
OpenCLâ„¢ and the AMD APP SDK. It looks like both support OpenCL but I haven't found anything on the site explicitly pointing out why one would use one over the other. What's the difference between these two?
The AMD APP SDK is here: http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx
The website should also answer your question about the difference between Stream and APP:
AMD Accelerated Parallel Processing (APP) SDK (formerly ATI Stream)
It used to be called AMD Stream SDK, they probably renamed it after adding support for non-Firestream hardware (namely OpenCL)
stream is the higher level amd-specific project (hardware and software) that includes opencl as the current software implementation. stream originally used the "brook" language, but switched to opencl in 2011. since then opencl became more popular (because it is a cross-platform standard that has been particularly well supported by apple) and these days amd doesn't seem to mention stream much. you can see this in a link like http://www.amd.com/us/products/technologies/stream-technology/opencl/pages/opencl.aspx where opencl is a "child" of stream (or the menu on the left of that page, where the higher level group is stream; other children are related to hardware).
in short, you want opencl. and despite the confusing mess that is amd's site, their opencl implementation is pretty solid.
hmmm. re-reading your question you seem to say there are two separate sdks. do you actually drill down to two different packages? my understanding is that opencl is the stream sdk. if you have found two different sdks (that are both current) can you link to them?

What language and compiler to write ATI GPU code for?

I know Nvidia has CUDA, but what does ATI have? I dont want to use OpenCL because I want to keep as low level to the hardware as possible.
Is it brook, or stream?
The documentation available is pretty pathetic! CUDA seems easy to get programming, but I want to use ATI specifically because of their hardware.
OpenCL is AMD's currently preferred GPU/compute language.
Brook is deprecated.
However, you can write code at a very low level, using AMD's
shader and kernel analyzer
http://developer.amd.com/tools/shader/Pages/default.aspx.
http://developer.amd.com/tools/AMDAPPKernelAnalyzer/Pages/default.aspx
E.g. http://developer.amd.com/tools/shader/PublishingImages/GSA.png
shows OpenCL code, and the Radeon 5870 assembly produced.
You can actually code directly in several forms of "assembly".
Or at least you could - the webpages no longer mention this.
(I used to have this installed for tuning and testing, but do not at the moment.)
More usually, you can code in any of several forms of AMD IL, Intermediate Language,
which is closer to the machine than OpenCL. The kernel analyzer web page says
"If your kernel is an IL kernel Stream, KernelAnalyzer will automatically compile the IL..."
I would recommend that you use OpenCL, and then look at the disassembly and tweak the OpenCL code to be better tuned. But you can work in IL, and probably still can work at an even lower level.

Using Delphi to take advantage of GPGPU technology?

GPGPU is the principle of using the parallel processors on video cards for massive increases in performance.
Does anyone have any ideas about using GPGPU in Delphi, using either OpenCL or CUDA? CUDA was/is NVidia only, but they have also adopted the OpenCL "standard".
I found a few Delphi samples from Google searches but they either crash or don't compile/run.
The ultimate instruction sample would be:
Download and install the OpenCL DLLs from here.
Download the OpenCL SDK from from here.
Download this sample Delphi project from here.
Open and compile the Delphi project. If all goes to plan it will do "whatever it is supposed to do"
At that stage I can then start researching the OpenCL SDK and writing/compiling DLLs to call from any Delphi app.
This sort of stuff is really starting to take off. Embarcadero do not have to do anything themselves at this stage (unless they want to), but if there were a tutorial and samples for Delphi available it would be great. Many samples are available for other languages, but we do also need a good and simple Delphi example to show how easy it is to use Delphi for GPGPU apps.
CUDA is still nVidia only, and that won't change. OpenCL is a true standard in this case, not only limited to GPGPU.
As for using it in Delphi, all I know of is how to use it in Free Pascal. However, there's quite some chance that the code will be portable, here's a link to updated headers:
FreePascal Mantis RFE OpenCL
As for DLL's, if you use nVidia, they can be found here.
Here however we have a sample project in Delphi.
You could be interested in GPGPUonDelphi2007.
GPGPU example plus needed OpenGL and CG libraries for Delphi 2007 now available!
I created the necessary OpenGL and CG (delphi) packages yesterday and finished converting/translating/porting a C GPGPU OpenGL/CG example to Delphi today, and I would like to share it with you so that maybe some more (Delphi) people will look into GPGPU programming, especially with OpenGL 3.0 for (older) DX9 graphics cards.
You should use CUDA DELPHI
In native pascal code you can run CUDA kernels
I made a floating point test, using OpenCL and Delphi, some time ago:
https://plus.google.com/110131086673878874356/posts/eWcipK16MV7
(contains link to demo and sources)

Resources