vertex shaders vs vertex - ios

I have a question - I am learning OpenGL ES 2.0 from this tutorial and moving across this website, I have build nice app, with spinning polygon.
I find another guide where he used vertex shaders. What are the differences between them. What else I can make with shaders?

The difference is that the first tutorial uses OpenGL ES 1.1, and the second uses OpenGL ES 2.0. 1.1 used the fixed-function pipeline to do all of its rendering, while 2.0 exclusively uses shaders.
All of those matrix functions? glLoadIdentity, glFrustum, glRotate? They're gone in 2.0. Instead, you write a program (shader) that executes on the GPU itself. The shader responsible for transforming vertex positions is called the "vertex shader".
So the vertex shader replaces all of the automatic matrix transforms with a much more flexible, user-driven, computation system.

In a nutshell, OpenGL ES 1.1 is (much) easier to get into, while OpenGL ES 2.0 is much more flexible and probably potentially a lot faster. There are some things you just can't do in 1.1.
OpenGL ES 1.1 and 2.0 are completely mutually incompatible, so choose wisely.
There is much more material out there to learn 1.1 than there is for 2.0.

From my understanding of it, Vertices are representations of points on the 3D things you render, while Vertex Shaders are a means to temporarily modify a Vertex before rendering. Vertex Shaders run on your video card (gpu), so you can perform many actions in parallel (e.g. perform the same function on all of the vertices in your scene)- this takes a lot of burden off of your cpu.

Related

OpenGL rendering to Framebuffer, transfer to OpenCV Umat for OpenCL accelerated processing

In an older version of OpenCV I could render using OpenGL to the backbuffer, used glreadpixels to "copy" the pixels to an OpenCV image (iplimage?) for some processing (blurring, templatematching with another OpenCV-loaded image). However that would cost me with the transfer from GPU to CPU and then if I wanted to display it, back to the GPU.
Now I can do something similar with just OpenGL and OpenCL, by using clEnqueueAcquireGLObjects and I do not have to transfer at all. I OpenGL render to a Framebuffer and let OpenCL take control of it.
However this forces me to write my own OpenCL kernels (nobody has time for that...actually terribly hard to debug OpenCL on an Nvidia) for whatever processing I would like to do. Now that OpenCV has some great OpenCL-accelerated processes I would like to try them out.
So my question: Is it possible to render to the Framebuffer (or another GL context on the GPU), give control (or copy) to an OpenCV context (umat?) for OpenCL-accelerated processing? If so, how (big picture, key components)?
I have a feeling I can use cv::ogl::Buffer to wrap the buffer, but the documentation is not exactly clear on this, and then copy it using ogl::Buffer::copyTo. similar: Is it possible to bind a OpenCV GpuMat as an OpenGL texture?
other ref: Transfer data from Mat/oclMat to cl_mem (OpenCV + OpenCL)
Yes it is possibly - now. I wrote demos that demonstrate OpenGL/OpenCL/VAAPI interop. tetra-demo renders a rotating tetrahedron using OpenGL, passes the framebuffer (actually a texture attached to a framebuffer) on to OpenCV/CL (as a cv::UMat) and encodes it to VP9. All on the GPU. The only problem is, that the required fixes reside in my OpenCV 4.x fork and you'll have to build it yourself.
It also requires two OpenCL extensions: cl_khr_gl_sharing and cl_intel_va_api_media_sharing
There are two open github issues addressing my efforts:
OpenGL/OpenCL and VAAPI/OpenCL interop at the same time
Capture MJPEG from V4l2 using VAAPI hardware acceleration to OpenCL using VAAPI/OpenCL interop

Is possible to use Metal API for compute simultaneously with OpenGL ES 3.0 for graphics?

I would want to port some OpenCL sample using Metal as compute API (as iOS doesn't support ES 3.1 compute shaders still) and OpenGL ES as graphics API, as sample uses OCL/OGL interop seems easiest way to port..
questions is a app can make use of Metal and OpenGL ES APIs simultaneously and if and how interop is achieved i.e. OpenGL mapping a buffer object of data compute by Metal..
thanks..
Yes, you can use openGL and Metal API simultaneously, but keep in mind, any intercommunication between two layers leads to overhead of host/gpu memory traffic: you have to copy textures and buffers between Metal/OpenGL representations. I think the best way to utilize gpu is use one of these technology stack. Moreover all what you can do with opengl shaders you can do with metal kernels. In general, as i can judge after year of practice with Metal, it is more comfortable and convenient API then OpenGL. Enjoy with Metal, join us:)

Does OpenGL ES 2.0 have a steeper learning curve than Metal?

I'm very interested in 3D graphics and heard many developers raving about Metal.
Can someone who has worked with Metal and OpenGL ES 2.0 comment on how the learning curve compares to OpenGL ES 2.0?
As a beginner who aims to stay loyal to iOS, is Metal easier to learn and master than OpenGL ES 2.0 or is it harder because it is more advanced?
I hope this question will be useful to many as I am trying to figure out where to start.
As a beginner, you might be better served by starting with 3D graphics at a higher level. SceneKit for OS X and iOS lets you describe a 3D scene in terms of its content -- geometry, materials (textures/shading), lights, and cameras -- and load assets created with 3D modeling tools. SceneKit is built on OpenGL (ES), so it uses a lot of the same concepts. As you become familiar with those concepts, you can use SceneKit to work your way into the OpenGL world a bit at a time:
use shader modifiers to write GPU shader code that extends SceneKit's
use custom programs to write complete shaders that replace SceneKit's, or techniques to write shaders that postprocess SceneKit's rendering
create custom geometry from your own vertex data with geometry sources & elements
use a node renderer delegate to write your own OpenGL client code that works within a scene
You'll find more info about all of these by watching the SceneKit videos from WWDC: What's New in SceneKit and Building a Game with SceneKit.
Otherwise... OpenGL (ES) and Metal don't have very different learning curves in and of themselves. In fact, I'd consider Metal more approachable than OpenGL in some ways -- for example, many things you can do in GL have implicit and hard-to-predict performance costs, and the Metal analogues of those tasks are much more clear about their impact on CPU or GPU time is and allow you to decide when expensive work gets done.
On the other hand, Metal is brand new -- there aren't yet a lot of third-party resources to help you learn it. And lot of the hard things about learning 3D graphics are very similar whether you're working in Metal, OpenGL, DirectX, or another platform/API. Once you learn the important stuff -- there are plenty of books and online tutorials for that, but StackOverflow isn't the best way to go looking for them -- getting up to speed with Metal or with OpenGL ES on a specific platform is pretty easy.
Coming from an OpenGL-ES background, I had a good look at the Metal APIs. I believe that the learning curve for Metal is steeper, not because it's a new API, but because it introduces low level constructs which developers previously didn't need to worry about.
If you compare fixed pipeline Open-GL with shader oriented Open GL flavours (On mobile: ES 1.x compared with ES 2.x, 3.x), and finally with Metal, what you have is increasingly powerful, increasingly generic APIs detached from the intuitive constructs (triangles, vertices, lamps) which constitute Open-GL's historical foundation.
Bear in mind that creating a more usable API isn't the main goal of Metal. The goal of this framework is helping developers to get rid of driver overheads.

iOS - GPU Accelerated Matrix Transpose, Multiplication and Eigen-Decomposition Dilemma

I'm working on a library that requires the use of vectors and matrices on the iOS platform. I decided to look into OpenGLES because the matrix and vector manipulations I plan on doing (mainly, transposing, matrix multiplication, and eigendecomposition) could definitely benefit from GPU acceleration.
The issue is that I'm not that familiar with OpenGLES and honestly might not be the best option. If I were to utilize OpenGLES, would I have to manually write the algorithms that do the matrix transposition, multiplication and eigendecomposition? Or is there another Apple or 3rd party framework that can help me with these tasks.
The main dividing issue however is that I want these operations to be GPU accelerated.
I'm going to implement my program using the Accelerate Framework and vectorized arithmetic and then test to see if its fast enough for my purposes and, if it isn't, then try the GPU implementation.
As combinatorial states, Accelerate uses SIMD to accelerate many of its functions, but it is CPU-based. For smaller data sets, it's definitely the way to go, but operating on the GPU can significantly outclass it for large enough data sets with easily parallelized operations.
To avoid having to write all of the OpenGL ES interaction code yourself, you could take a look at my GPUImage framework, which encapsulates fragment shader operations within Objective-C. In particular, you can use the GPUImageRawDataInput and GPUImageRawDataOutput classes to feed raw byte data into the GPU, then operate over that using a custom fragment shader.
A matrix transpose operation would be quick to implement, since all of the matrix elements are independent of one another. Matrix multiplication by a constant or small matrix would also be reasonably easy to do, but I'm not sure how to scale the multiplication of two large matrices properly. Likewise, I don't have a good implementation of eigendecomposition that I could point to off of the top of my head.
The downside to dealing with fragment shader processing is the fact that by default OpenGL ES takes in and outputs 4-byte RGBA values at each pixel. You can change that to half floats on newer devices, and I know that others have done this with this framework, but I haven't attempted that myself. You can pack individual float values into RGBA bytes and unpack at the end, as another approach to get this data in and out of the GPU.
The OpenGL ES 3.0 support on the very latest A7 devices provides some other opportunities for working with float data. You can use vertex data instead of texture input, which lets you supply four floats per vertex and extract those floats in the end. Bartosz Ciechanowski has a very detailed writeup of this on his blog. That might be a better general approach for GPGPU operations, but if you can get your operations to run against texture data in a fragment shader, you'll see huge speedups on the latest hardware (the iPhone 5S can be ~100-1000X faster than the iPhone 4 in this regard, where vertex processing and CPU speeds haven't advanced nearly as rapidly).
The accelerate framework is not accelerated on the GPU, but it is very well optimized and uses SIMD on Neon where appropriate.

How important to send Interleaved Vertex Data on ios

I am using Assimp to import some 3d models.
Assimp is great, but it stores everything in a non-interleaved vertex format.
According to the Apple OpenGL ES Programming Guide, interleaved vertex data is preferred on ios: https://developer.apple.com/library/ios/#documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html#//apple_ref/doc/uid/TP40008793-CH107-SW8
I am using VertexArrays to consolidate all the buffer related state changes - is it still worth the effort to interleave all the vertex data?
Because interleaved vertex data increases the locality of vertex data, it allows the GPU to cache much more efficiently and generally to be a lot lighter on memory bandwidth at that stage in the pipeline.
How much difference it makes obviously depends on a bunch of other factors — whether memory access is a bottleneck (though it usually is, since texturing is read intensive), how spaced out your vertex data is if not interleaved and the specifics of how that particular GPU does fetching and caching.
Uploading multiple vertex buffers and bundling them into a vertex array would in theory allow the driver to perform this optimisation behind your back (either so as to duplicate memory or once it becomes reasonably confident that the buffers is the array aren't generally in use elsewhere) but I'm not confident that it will. But the other way around to look at it is that you should be able to make the optimisation yourself at the very end of your data pipeline, so you needn't plan in advance for it or change your toolset. It's an optimisation so if it's significant work to implement then the general rule against premature optimisation applies — wait until you have hard performance data.

Resources