Intel Compiler Intrinsics - vectorization

I have decided to play around with a trial version of Intel Compiler.
And now what I am trying to understand if there is a situation when I need explicitly use intrinsics to improve code performance.
It seems that auto-vectorization (or , shall I say, the way how the compiler utilizes its SSE and AVX registers) when the compiler does all the job behind the scene - works fine.
Could you, please, show me an opposite example? May be some cases that involve OpenMP? What do you think?

See this slides, then you will know the answer.
https://users.ece.cmu.edu/~franzf/teaching/slides-18-645-simd.pdf

Related

What are the steps should be taken to make sure that the OpenCV code running on PC will run on a particular embedded device?

I want to port a good OpenCV code on an embedded platform. Earlier such stuffs were very difficult to perform but now TI has come up with nice embedded platforms which are comparatively hassle free as they say.
I want to know following things:
Given that :
The OpenCV code is already running on PC smoothly. (obviously)
Need to determine these before purchasing the device.
Can't put the code here in stackoverflow. :P
To chose from Texas Instruments: C6000.
Questions:
How to make it sure that the porting will be done?
What steps to be taken to make it sure that after porting the code, will run (at least).
to determine whether the code might require some changes to make its run smooth.
The point 3 above is optional.
I need info which will at least give me some start up in this regard.
What I thought I should do?
I am to list the inbuilt functions down.
Then to find available online bench marking for those functions for the particular device like as shown towards the end of this doc.
...
Need to know how to proceed further?
However C6-Integra™ DSP+ARM Processor seems the best.
The best you can do is to try a device simulator (if it is available), but what you'll see there is far from perfect.
Actually, nothing can tell you how fast and how well the app will run on the embedded device before running you specific app on that specific device.
So:
Step 1 Buy it
Step 2 Try it
Things to consider:
embedded CPU architecture: Your app needs a big cache? how big is the embedded cache?
algorithm: do you use a lot of floating point operations? how good is the device at floating point ops?
do you have memory transfers? data bus on a PC is waaay faster than on embedded
hardware support: do you use a lot of double-precision calculations? they are emulated on ARMs. They are gonna kill your app (from millisecons on a PC it can go to seconds on a ARM)
Acceleration. Do your functions use SSE? (many OpenCV funcs are SSEd, even if you don't know). Do you have the NEON counterpart? (OpenCV does not have much support for that). The difference can be orders of magnitude from x86 SSE to embedded without NEON.
and many, many others.
So, again: no one can tell you how it will work. Just the combination between the specific app and the real device tells the truth.
even a run on a similar device is not relevant. It can run smoothly on a given processor, and with another, with similar freq or listed memory, it will slow down too much
This is an interesting question but run is a very generic word in this context, therefore I feel the need to break it down to other 2 questions:
Will it compile in an embedded device?
Will it run as fast/smooth as in a PC?
I've used OpenCV in a lot of different devices, including ARM, SH4, MIPS and I found out that sometimes the manufacturer of the device itself provides a compiled version of OpenCV (for my surprise), which is great. That's something you can look into, maybe the manufacturer of your device provide OpenCV binaries.
There's no way to know for sure how smooth your OpenCV application will be on the target device unless you are able to find some benchmark of OpenCV running in there. PCs have far better processing power than embedded devices, so you can expect less performance from the target device.
There are 3rd party applications like opencv-performance, that you can use to test/benchmark the environment once you get your hands on it. And if performance is such a big deal in this project, you might also be interested in this nice article which explain some timing tests done on couple of OpenCV features comparing implementations using the C and C++ interfaces of OpenCV.

What's best for your Video Tracking? Why?

Best as in reliable, maintainable and fast.
Considering Processing, VVVV or OpenFrameworks?
I know Processing doesn't handle big video frames very well.
VVVV (Nodes use OpenCV) is just for Windows.
OpenFrameworks (OpenCv) is more complicated than the
above.
You can try to implement your app in Processing and see if it fits your needs and is fast enough. It should a little more easy and faster to write Java instead of C++.
Here can you find how to setup with processing with examples: http://ubaa.net/shared/processing/opencv/
If you don't want to code anything you can try VVVV, should be little faster but only on Windows as you mentioned.
If your Processing app is running too slow, you can try openFrameworks.
download it the new OF 007 from http://www.openframeworks.cc/ and check out the setup guide.
If you have done the install you can play around with the openCV examples from
<your-OF-folder>/apps/addonsExamples/opencvExample
<your-OF-folder>/apps/addonsExamples/opencvHaarFinderExample/
Personally I prefer OF because you can do any custom thing with the most performance, but its good to make your prototype with Processing to see if it works and implement it after that again in OF.
As far as I can see from your question, VVVV and OF are the options your looking at, but you prefer VVVV's node based programming over OF, but aren't happy that VVVV is Windows only.
Have you considered other alternatives like MaxMSPJitter or PureData ?
Both are similar to VVVV or the other way around :)
MaxMSP has a package for 'optimized matrix operations'(3D/video) called Jitter.
For Jitter there is a cv.jit free collection of external objects and the samples/tutorials are great.
Similarly PureData has an add-on called Gem, which is similar to Max's Jitter package.
I haven't tried with PureData, but there are OpenCV bindings for it, through Gem.
cv.jit
pdp OpenCV PureData Bindings - via Piksel.no
MaxMSP uses quicktime on osx and can use directX on windows, but it's commercial.
PureData runs on windows/osx/linux, it's free and opensource.
HTH

DirectX 9 or DirectX 10 for starters?

I want to do projects to make my resume more appealing to game companies. So I am going to start buying books. But I don't know rather to read DirectX 9 or 10 api books to start off with. DirectX10 is great, but it seems the industry is moving slow to 10. so should I use 9 or go with 10 ??
I would suggest learning the basics using directx9 and then rapidly moving on to dx11. DirectX11 is harder to get started in than DirectX9 because it's slightly more complex but also a lot of the utility functions in D3DX are no longer there, or have been moved to source code like the effects framework. This is no bad thing, but it does make it signifiacantly more complex to learn as you have to learn a lot more things at once.
Spend 2 or 3 weeks learning DX9 then move to DX11 for "real" work :P
Learn basic DX9 using the fixed pipeline and d3dx for loading models etc. It's a lot simpler than DX11 and much better documented, and you'll get a triangle and then a model on screen very much faster. Play with that until you completely understand the basic concepts and tranformations.
But then rewrite it all using shaders only. You'll need to use them in DX10/11 anyway but it's a lot easier to learn when you already have a working framework of code, and it's a lot simpler to get that working in DX9.
Once you have that working, learn DX11. You'll have to switch math libraries. You'll have to invent your own model formats and loaders. You'll have to either invent your own effects framework or use the example one, but they are all much easier now you already know the basics of 3d and programming shaders.
TBH further to OneOfOne's comment if you know how to do 3D development in GL, D3D9, D3D10 or D3D11 then you can transfer those skills to any of the others with a little bit of work.
Personally I'd aim for D3D11 as that way you are learning the cutting edge. You'll find you'll be able to do GL, D3D9 or D3D10 with a little work. Do enough work on the theory and you'll discover that its not even that hard to transfer the skills to a fully software engine.
If your intention is really to learn a skill that you would use in the game industry, stick with DirectX 9. Since DirectX 10 and 11 both require Vista or Window 7, game developers are still mostly ignoring them and targeting DirectX 9 in order to have support for Windows XP.
That being said, it doesn't really matter which you start with. The differences are not that large. If you understand the concepts behind 3D APIs and how the GPU pipeline works, you can pick up any of the three or even OpenGL with minimal effort.
Fact is, you need to learn both.
As long as 50% of gamers are still on WinXP, you're going to need to be able to program in Direct3D9.
D3D9 isn't any easier to get started with than D3D10/11. Its the same principles, with vertices to be placed, normals to be calculated, and meshes to be rendered. Whether you're creating a ID3D11BlendState structure or calling IDirect3DDevice9::SetRenderState(), its the same concept, just different ways of doing it.
After working with d3d11 a couple of days, I've come to think of it as better than DX9 in a lot of ways. For one, you're able to use the full caps of the GPU including geometry shaders. 2nd, it forces you to fully understand the graphics pipeline to even draw anything (note how functions are named after the stage of the pipeline they affect: here: (IA* fcns: input-assembler stage, OM* fcns: output-merger stage etc) ). This may result in a slightly larger INITIAL startup curve, but once you get it, its not any harder than D3D9 and is better, since the very naming of the functions helps concepts stick.
So get going on both, and learning them in tandem may help reduce the amount of effort you spend learning deprecated API's/methods of doing things from DX9 (ie you really want to spend more time using shaders, and don't use the fixed function pipeline section of DX9 too much).
You can check Luna's books for DX9 /DX11(I suggest you start with 11). You can check out http://www.rastertek.com/tutdx11.html but he doesn't explain everything so you can go in Luna s book to see what is with those functions or properties
With some little exceptions, DX10 is just a legacy free DX9. For example DX9 had build in options for rendering Flatshaded, Textured or using a Shader. In DX10 these options are gone, you always have to use a real shader. If you want to do flatshading, write a HLSL shader that does flat shading.
So I would suggest you learn DX10 (or DX11). You will be able to adopt fast to DX9 but with a more modern coding style by not using legacy functions. They can be quiet confusing, so DX10 will focus you on relevant things.
If you are a real beginner, and setting up a vertex-buffer to create a single triangle is confusing you (as real 3D-Programmer you are no more interesten in single triangles) I even would suggest to start with OpenGL. You will have faster success, but in reality this can be a little bit distracting as DX9-Legacy if you want to focus on modern 3D-Coding.
Yes do not waste your time with DX10 it was never really adopted as the industry standard for any period of time, there wasn't any big enough changes to warrant people upgrading from DX9 but for DX11 there was.
I suggest directx 11, there's no reason in my opinion to waste time on deprecated functions or techniques.
Learning shaders from the start will make things way more clear
Try doing the samples from the sample folder of both 9 and 10, and if your computer can support it, 11. This is what I am doing.

Is DirectSound the best audio abstraction layer for Windows?

Is DirectSound the best audio abstraction layer for Windows?
Switching my app from a very bad sound implementation, built to a specific chipset, to an abstration layer.
App is native Winform, .net 3.5. DirectX/DirectSound is the likely choice, but a little concerned about the overhead. Any other options? Or is it silly to even THINK about anything else?
DirectSound is not getting the same love from Microsoft today as it got in the past. As far as DirectX is concerned, you may try XAudio2 or XACT instead. Some people love those, others hate them. XAudio2 is more low-level, while XACT is rather high-level. Both are accessible from Microsoft XNA, which is like Managed DirectX, but is actively developed.
But you are not restricted to using what DirectX comes with. Try FMod if you want something great. They still have their Shareware/Hobbyist license model and a Freeware license model, in case you don't want to pay some big bucks.
Your choice depends on what exactly you want to do with sound.
See if SDL looks better.
Well, you can try OpenAL instead. What OpenGL is to Direct3D is OpenAL to DirectSound(3D). The interface is pretty similar to OpenGL, if you don't like that, you'll probably dislike OpenAL, too. Also I'm not sure if the Windows version of this lib is an own, native implementation or just calls DirectSound and thus might just be a (thin?) wrapper on top of it.
DirectSound is pretty good.
If you need low latency or good support for sound input and output via multiple soundcards at the same time you may also want to have a look at ASIO:
http://de.wikipedia.org/wiki/Audio_Stream_Input/Output
The waveOut... API is still an option. It's tricky to work with from managed code, but you can play multiple sounds at once this way (in XP and Vista, at least).
If you just need to play sounds occasionally, System.Media.SoundPlayer is very easy to use. However, you can't play more than one sound at a time with this component.
DirectSound is your only other major alternative. It has a built-in software synthesizer, if that's something you need.
EDIT: SDL looks interesting. Thanks, Sijin.
SharpDX looks interesting. I'm planning on trying it as a replacement for Managed DirectX because of the x86 limitations of the latter.

Automatic image rotation based on a logo

We're looking for a package to help identify and automatically rotate faxed TIFF images based on a watermark or logo.
We use libtiff for rotation currently, but don't know of any other libraries or packages I can use for detecting this logo and determining how to rotate the images.
I have done some basic work with OpenCV but I'm not sure that it is the right tool for this job. I would prefer to use C/C++ but Java, Perl or PHP would be acceptable too.
You are in the right place using OpenCV, it is an excellent utility. For example, this guy used it for template matching, which is fairly similar to what you need to do. Also, the link Roddy specified looks similar to what you want to do.
I feel that OpenCV is the best library out there for this kind of development.
#Brian, OpenCV and the IntelIPP are closely linked and very similar (both Intel libs). As far as I know, if OpenCV finds the intel IPP on your computer it will automatically use it under the hood for improved speed.
The Intel Performance Primitives (IPP) library has a lot of very efficient algorithms that help with this kind of a task. The library is callable from C/C++ and we have found it to be very fast. I should also note that it's not limited to just Intel hardware.
That's quite a complex and specialized algorithm that you need.
Have a look at http://en.wikipedia.org/wiki/Template_matching. There's also a demo program (but no source) at http://www.lps.usp.br/~hae/software/cirateg/index.html
Obviously these require you to know the logo you are looking for in advance...

Resources