GPUImage Kuwahara filter on iPhone 4S - ios

I'm using Brad Larson's GPUImage framework. However when I'm trying to apply kuwahara filter with filter radius 5.0f , I'm getting artifacts on an iPhone 4S. (works fine on higher performance devices)
Source image size was 2048x2048px.
By reading original developer's comments I understood that there's a kind of watchdog timer which fires when something takes too long to run on the GPU.
So my question is , what is the maximum possible resolution for an iPhone 4S I can apply Kuwahara filter with radius of 5.0f without getting artifacts ?

Kuwahara filter makes square artefacts and very complex.
You can use Generalised Kuwahara filter (e.g. with 8 segments).
You can manually generate shader without cycles for selected radius. For decreased number of readings from texture, you can make trick:
Generate shader for constant radius.
Pixels offset must depend on ratio of current radius and constant radius.
You get some artefacts, but they are artistic (like canvas). And Kuwahara will be faster.

There really isn't a hard limit. The tiling artifacts you are seeing are due to the OpenGL ES watchdog timer aborting the scene rendering after it takes too long. If you have a single frame that takes longer than approximately 2 seconds to render, your frame rendering will be killed in this manner.
The exact time it takes is a function of hardware capabilities, system load, shader complexity, and iOS version. In GPUImage, you pretty much only see this with the Kuwahara filter because of the ridiculously unoptimized shader I use for that. It's drawn from a publication that was doing this using desktop GPUs, and is about the worst case operation for a mobile GPU like these. Someone contributed a fixed-radius version of this which is significantly faster, but you'll need to create your own optimized version if you want to use this with large images on anything but the latest devices.

Related

Blurry images during object detection from iOS app

I've written an app with an object detection model and process images when an object is detected. The problem I'm running into is when an object is detected with 99% confidence but the frame I'm processing is very blurry.
I've considered analyzing the frame and attempting to detect blurriness or detecting device movement and not analyzing frames when the device is moving a lot.
Do you have any other suggestions to only process un-blurry photos or solutions other than the ones I've proposed? Thanks
You might have issues detecting "movement" when for instance driving in car. In that case looking at something inside your car is not considered as movement while looking at something outside is (if it's not far away anyway). There can be many other cases for this.
I would start by checking if camera is in focus. It is not the same as checking if frame is blurry but it might be very close.
The other option I can think of is simply checking 2 or more sequential frames and see if they are relatively the same. To do something like that it is bast to define a grid for instance 16x16 on which you evaluate similar values. You would need to mipmap your photos which manually means resizing it by half till you get to 16x16 image (2000x1500 would become 1024x1024 -> 512x512 -> 256x256 ...). Then grab those 16x16 pixels and store them. Once you have enough frames (at least 2) you can start comparing these values. GPU is perfect for resizing but those 16x16 values are probably best evaluated on the CPU. What you need to do is basically find an average pixel difference in 2 sequential 16x16 buffers. Then use that to evaluate if detection should be enabled.
This procedure may still not be perfect but it should be relatively feasible from performance perspective. There may be some shortcuts as some tools maybe already do resizing so that you don't need to "halve" them manually. From theoretical perspective you are creating sectors and compute their average color. If all the sectors have almost same color between 2 or more frames there is a high chance the camera did not move in that time much and the image should not be blurry from movement. Still if camera is not in focus you can have multiple sequential frames that are exactly the same but in fact they are all blurry. Same happens if you detect phone movement.

iOS: Render a purely pixel-based fractal effect using OpenGL ES?

I am new to Objective-C and OpenGL, so please be patient.
I'm building an app that is mainly based on a full-screen 2D pixelbuffer that is filled and animated using mathematical formulas (similar to fractals), mostly using sin, cos, atan etc.
I have already optimized sin and cos by using tables which gave quite an fps boost, however, while the framerate is cool in the Simulator on a Mac Mini (around 30 fps), I get a totally ridiculous 5 fps on an actual device (iPad Mini non-retina).
As I see no further ways to optimize the pixel loops, would it be possible to implement the effects using, say, an OpenGL shader, and then just draw a fullscreen quad with a texture on it?
As I said, the effects are really simple and just iterate over all pixels in a nested x/y loop and use basic math and trig functions. The way I blit to the screen is already optimal for the device while staying in non-OpenGL, and gives like a million FPS if I leave out the actual math.
Thanks!
If you implement this as a OpenGL shader you will get a rediculously massive increase in performance. The shader would run on the graphics chip, which is designed to be massively parallel, and is optimized exactly for this kind of math.
You don't make a texture so much as define a shader for the surface. Your shader code would be invoked for every rendered pixel on that surface.
I would start by trying to see if you can hack a shader here: http://glsl.heroku.com/
Once you have something working, you can research how to get an OpenGL context working with your shader on iOS, and you shouldn't have to change the actual shader much to get it working.

iOS OpenGL ES 2.0 VBO vertex count limit: Once exceeded, CPU bound

I am testing the rendering of extremely large 3d meshes, and I am currently testing on an iPhone 5 (I also have an iPad 3).
I have here two screenshots of Instruments with a profiling run. The first one is rendering a 1.3M vertex mesh, and the second is rendering a 2.1M vertex mesh.
The blue histogram-bar at the top shows CPU load, and it can be seen that for the first mesh is hovering at around ~10% CPU load so the GPU is doing most of the heavy lifting. The mesh is very detailed and my point-light-with-specular shader makes it look quite impressive if I say so myself, as it is able to render consistently above 20 frames per second. Oh, and 4x MSAA is enabled as well!
However, once I step up to a 2 million+ vertex mesh, everything goes to crap as we see here a massive CPU bound situation, and all instruments report 1 frame per second performance.
So, it's pretty clear that somewhere between these two assets (and I will admit that they are both tremendously large meshes to be loading in under one single VBO), whether it is the vertex buffer size or the index buffer size that is over the limit, some limit is being surpassed by the 2megavertex (462K tris) mesh.
So, the question is, what is this limit, and how can I query it? It would really be very preferable if I can have some reasonable assurance that my app will function well without exhaustively testing every device.
I also see an alternative approach to this problem, which is to stick to a known good VBO size limit (I have read about 4MB being a good limit), and basically just have the CPU work a little bit harder if the mesh being rendered is monstrous. With a 100MB VBO, having it in 4MB chunks (segmenting the mesh into 25 draw calls) does not really sound that bad.
But, I'm still curious. How can I check the max size, in order to work around the CPU fallback? Could I be running into an out of memory condition, and Apple is simply applying a CPU based workaround (oh LORD have mercy, 2 million vertices in immediate mode...)?
In pure OpenGL, there are two implementation-defined attributes: GL_MAX_ELEMENTS_VERTICES and GL_MAX_ELEMENTS_INDICES. When exceeded performance can drop off a cliff in some implementations.
I spent a while looking through the OpenGL ES specification for the equivalent and could not find it. Chances are it's burried in one of the OES or vendor-specific extensions on OpenGL ES. Nevertheless, there is a very real hardware limit to the number of elements you can draw and the number of vertices. After a point with too many indices, you can exceed the capacity of the post-T&L cache. 2 million is a lot for a single draw call, in lieu of being able to query the OpenGL ES implementation for this information, I'd try successively lower powers-of-two until you dial it back to the sweet spot.
65,536 used to be a sweet spot on DX9 hardware. That was the limit for 16-bit indices and was always guaranteed to be below the maximum hardware vertex count. Chances are it'll work for OpenGL ES class hardware too...

glDrawElements with GL_LINES forces gleRunVertexSubmitARM? (or: why drawing wireframes is slow on iOS?)

while doing some tests for a small project for iPhone/iPad that I'm working on, I observed that there is a big CPU performance penalty in drawing wireframes using glDrawElements with GL_LINES.
This is the scenario:
a model with 640 vertexes ( 4 floats for position, 3 floats for normals, no alignment problems… all on 4 bytes boundaries )
3840 indexes ( unsigned short )
both vertexes and indexes use VBOs ( no VAO )
the above model drawn with glDrawElements with GL_TRIANGLES works fine
Then:
same model with 640 vertexes
2560 indexes
VBOs and no VAO
drawn with glDrawElements with GL_LINES triggers continuous calls to gleRunVertexSubmitARM, CPU usage sky rockets...
In both cases the models look as expected and no glErrors around...
It seems that the issue is device dependent. I experience it on an iPhone 3GS and iPhone 4, NOT on an iPad 2 nor the simulators.
On an iPad 2 frame-time CPU = 1ms and no calls to gleRunVertexSubmitARM, on an iPhone 4 frame-time CPU = 12ms and continuous calls to gleRunVertexSubmitARM.
Can anyone explain this behaviour or point out what mistakes I might be making?
Any insight is highly appreciated.
Thanks in advance,
Francesco
not an easy answer to a not easy question I would say.
Anyway, the reason why 2 devices of the same "family" behave in different ways could depend on many factors.
First of all, they mount different GPUs (I am very sure you know already it so, sorry to state the obvious) which brings the following differences:
Iphone 4 and Iphone 3GS mount the same GPU, the PowerVR SGX535
IPAD 2 uses the PowerVR SGX543MP2
First of all, the latter is an evolution of the first with a much different throughput and a newer architecture.
This alone does not anyway explain everything, the reason why you notice much more calls to gleRunVertexSubmitARM could be explained to the OpenGL driver implementation performed by PowerVR on its GPUs, most probably the SGX535 GPU driver performs the operations you require over an hook on that function.
Last, but not least, performance wise, drawing with GL_LINES is most of the time very inefficient for several reasons:
Does not perform any hidden geometry detection
Does not perform any face culling
Reading around (and about my own experience of 2-3 years ago), using GL_LINE_WIDTH or GL_LINE_SMOOTH causes the driver to perform a "software" render not using any HW acceleration. This depends on the GPU and its OpenGL driver implementation
When a filled polygon is rendered, the driver can optimize the operations with "Hierarchical Depth Buffer", with GL_LINES it cannot (again, this depends a lot on the driver but this is a very common aspect)
Some drivers translates your GL_LINES mesh in triangles at the moment of the rendering. This is something I cannot prove but a very common topic with past game engines developers friends.
I hope to have helped you in some way.
Ciao
Maurizio
I am on PC, but I feel this is relevant.
I noticed the frame rate in wireframe mode : glPolygonMode(GL_FRONT_AND_BACK, GL_LINE)...
was 10x slower if glEnable(GL_LINE_SMOOTH);
You can disable it with this command : glDisable(GL_LINE_SMOOTH);

High-performance copying of RGB pixel data to the screen in iOS

Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL

Resources