I am receiving YUV 420 CMSampleBuffers of the screen in my System Broadcast Extension, however when I attempt to access the underlying bytes, I get inconsistent results: artefacts that are a mixture of (it seems) past and future frames. I am accessing the bytes in order to rotate portrait frames a quarter turn to landscape, but the problem reduces to not being able to correctly copy the texture.
The pattern of artefacts can change quite a lot. They can be all over the place and seem to have a fundamental "brush shape" that is square tile, sometimes small, sometimes large, which seems to depend on the failing work around at hand. They can occur in both the luminance and chroma channels, which results in interesting effects. The "grain" of the artefacts sometimes appears to be horizontal, which I guess is vertical in the original frame.
I do have two functioning work arounds:
rotate the buffers using Metal
rotate the buffers using CoreImage (even a "software" CIContext works)
The reason that I can't yet ship these workarounds is that System Broadcast Extensions have a very low memory limit of 50MB and memory usage can spike with these two solutions, and there seem to be interactions with other parts of the system (e.g. the AVAssetWriter or the daemon that dumps frames into my address space). I'm still working to understand memory usage here.
The artefacts seem like a synchronisation problem. However I have a feeling that this is not so much a new frame being written into the buffer that I'm looking at, but rather some sort of stale cache. CPU or GPU? Do GPUs have caches? The tiled nature of the artefacts reminds me of iOS GPUs, but that with a grain of salt (not a hardware person).
This brings me around to the question title. If this is a caching problem, and Metal / CoreImage has a consistent view of the pixels, maybe I can get Metal to flush the data I want for me, because an BGRA screen capture being converted to YUV IOSurface has Metal shader written all over it.
So I took the incoming CMSampleBuffer's CVPixelBuffer's IOSurface and created an MTLTexture from it (with all sorts of cacheModes and storageModes, haven't tried hazardTrackingModes yet) and then copied the bytes out with MTLTexture.getBytes(bytesPerRow:from:mipmapLevel:).
Yet the problem persists. I would really like to make the CPU deep copy approach work, for memory reasons.
To head off some questions:
it's not a bytes-per-row issue, that would slant the images
in the cpu case I do lock the CVPixelBuffer's base address
I even lock the the underlying IOSurface
I have tried discarding IOSurfaces whose lock seed changes under lock
I do discard frames when necessary
I have tried putting random memory fences and mutexes all over the place (not a hardware person)
I have not disassembled CoreImage yet
This question is the continuation of one a posted on the Apple Developer Forums
Art by https://twitter.com/artofzara
Related
I've written an app with an object detection model and process images when an object is detected. The problem I'm running into is when an object is detected with 99% confidence but the frame I'm processing is very blurry.
I've considered analyzing the frame and attempting to detect blurriness or detecting device movement and not analyzing frames when the device is moving a lot.
Do you have any other suggestions to only process un-blurry photos or solutions other than the ones I've proposed? Thanks
You might have issues detecting "movement" when for instance driving in car. In that case looking at something inside your car is not considered as movement while looking at something outside is (if it's not far away anyway). There can be many other cases for this.
I would start by checking if camera is in focus. It is not the same as checking if frame is blurry but it might be very close.
The other option I can think of is simply checking 2 or more sequential frames and see if they are relatively the same. To do something like that it is bast to define a grid for instance 16x16 on which you evaluate similar values. You would need to mipmap your photos which manually means resizing it by half till you get to 16x16 image (2000x1500 would become 1024x1024 -> 512x512 -> 256x256 ...). Then grab those 16x16 pixels and store them. Once you have enough frames (at least 2) you can start comparing these values. GPU is perfect for resizing but those 16x16 values are probably best evaluated on the CPU. What you need to do is basically find an average pixel difference in 2 sequential 16x16 buffers. Then use that to evaluate if detection should be enabled.
This procedure may still not be perfect but it should be relatively feasible from performance perspective. There may be some shortcuts as some tools maybe already do resizing so that you don't need to "halve" them manually. From theoretical perspective you are creating sectors and compute their average color. If all the sectors have almost same color between 2 or more frames there is a high chance the camera did not move in that time much and the image should not be blurry from movement. Still if camera is not in focus you can have multiple sequential frames that are exactly the same but in fact they are all blurry. Same happens if you detect phone movement.
I'm currently trying to reduce the memory size of my textures. I use texture packer already, as well as .pvr.cczs with either RGB565 or RGB5551. This, however, often leads to a huge, unacceptable reduction in texture quality.
Specifically, I got a spritesheet for the main character. In size it's roughly 4k*2.5k pixels. This is not really negotiable as we have lots of different animations and we need the character in a size acceptable for retina displays of ipads. So reducing the size of the character sprite would again result in huge reductions of quality when we use him in the scene.
So of course I'm trying to use 16 bit textures as often as possible. Using the above mentioned spritesheet as a 16 bit texture takes about 17 mb of memory. This is already a lot. As it's a spritesheet for a character, the texture needs transparency and therefor I need to use rgb5551 as colour depth. With only 1 bit for the alpha channel, the character just looks plain ugly. In fact, everything that needs alpha looks rather ugly with only 1 bit for the alpha channel.
However, if I'd use RGB8888 instead the spritesheet uses double the memory, around 34mb. Now imagine several characters in a scene and you'll end up with 100mb memory for characters alone. Add general overhead, sound, background, foreground, objects and UI to it and you'll end up with far too much memory. In fact, 100mb is "far too much memory" as far as I'm concerned.
I feel like I'm overlooking something in the whole process. Like something obvious I didn't do or something. RGB4444 is no solution either, it really looks unacceptably bad.
In short: How do I get acceptable texture quality including alpha channel for less than 100mb of memory? "Not at all"? Because that's kinda where it leads as far as I can see.
Split your main texture in 'per character/peranimation/per resolution' files. Use .pvr.ccz because they load faster (much faster, i've measured 8x faster on some devices'). If you are using TexturePacker, you should be able to eliminate most if not all artefacts from the 'pvr' conversion.
When running your scenes, preload only the 'next' posture/stance/combat that you know will need. Experiment with asynchronous loading, with completion block, to signal when the texture is available for use. Dump your unused texture as fast as you can. This will tend to keep the memory requirement flatish at a much lower clip than if you load all animations at once.
Finally, do you really need 15 frames for all these animations ? I get away with as few as 5 frames for some of the animations (idle, asleep, others too). TexturePacker takes of symetrical animations around a certain frame, just points frames midPoint +1 ... midPoint + N to MidPoint -N ... MidPoint -1.
In my application i should play video in unusual way.
Something like interactive player for special purposes.
Main issues here:
video resolution can be from 200*200px up to 1024*1024 px
i should have ability to change speed from -60 FPS to 60 PFS (in this case video should be played slower or faster depending on selected speed, negative means that video should play in back direction)
i should draw lines and objects over the video and scale it with image.
i should have ability Zoom image and pan it if its content more than screen size
i should have ability to change brightness, contrast and invert colors of this video
Now im doing next thing:
I splited my video to JPG frames
created timer for N times per seconds (play speed control)
each timer tick im drawing new texture (next JPG frame) with OpenGL
for zoom and pan im playing with OpenGL ES transformations (translate, scale)
All looks fine until i use 320*240 px, but if i use 512*512px my play rate is going down. Maybe timer behavour problem, maybe OpenGL. Sometimes, if im trying to open big textures with high play rate (more than 10-15 FPS), application just crash with memory warnings.
What is the best practice to solve this issue? What direction should i dig? Maybe cocos2d or other game engines helps me? Mb JPG is not best solution for textures and i should use PNG or PVR or smth else?
Keep the video data as a video and use AVAssetReader to get the raw frames. Use kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange as the colorspace, and do YUV->RGB colorspace conversion in GLES. It will mean keeping less data in memory, and make much of your image processing somewhat simpler (since you'll be working with luma and chroma data rather than RGB values).
You don't need to bother with Cocos 2d or any game engine for this. I strongly recommend doing a little bit of experimenting with OpenGL ES 2.0 and shaders. Using OpenGL for video is very simple and straightforward, adding a game engine to the mix is unnecessary overhead and abstraction.
When you upload image data to the textures, do not create a new texture every frame. Instead, create two textures: one for luma, and one for chroma data, and simply reuse those textures every frame. I suspect your memory issues are arising from using many images and new textures every frame and probably not deleting old textures.
JPEG frames will be incredibly expensive to uncompress. First step: use PNG.
But wait! There's more.
Cocos2D could help you mostly through its great support for sprite sheets.
The biggest help, however, may come from packed textures a la TexturePacker. Using PVR.CCZ compression can speed things up by insane amounts, enough for you to get better frame rates at bigger video sizes.
Vlad, the short answer is that you will likely never be able to get all of these features you have listed working at the same time. Playing video 1024 x 1024 video at 60 FPS is really going to be a stretch, I highly doubt that iOS hardware is going to be able to keep up with those kind of data transfer rates at 60FPS. Even the h.264 hardware on the device can only do 30FPS at 1080p. It might be possible, but to then layer graphics rendering over the video and also expect to be able to edit the brightness/contrast at the same time, it is just too many things at the same time.
You should focus in on what is actually possible instead of attempting to do every feature. If you want to see an example Xcode app that pushes iPad hardware right to the limits, please have a look at my Fireworks example project. This code displays multiple already decoded h.264 videos on screen at the same time. The implementation is built around CoreGraphics APIs, but the key thing is that Apple's impl of texture uploading to OpenGL is very fast because of a zero copy optimization. With this approach, a lot of video can be streamed to the device.
Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL
I'm making a Worms-style bitmap destructible terrain game using OpenGL. I'd like to know where the limitiations in terms of video memory are for the size of the worlds.
Currently, I use blocks of 512*512 RGBA textures for the terrain.
How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Is there any internal, automatic compression going on?
How much video memory can I expect most user's computers to have free?
How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Not enough information. You should always use sized OpenGL image formats (GL_RGBA8, GL_RGBA16).
GL_RGBA8 takes up 32-bits per pixel, which is 4 bytes. Therefore, 512*512*4 = 1MB.
Is there any internal, automatic compression going on?
No.
How much video memory can I expect most user's computers to have free?
How much are you using currently?
OpenGL will page image data in and out according to the available space. If you run out of GPU memory, OpenGL will happily allocate system memory and upload the images as needed.
But to be honest, your little Worms game isn't going to actually cost anything in terms of memory size. Maybe 64MB when you're done, tops. It's nothing you need to be concerned about.
I would not worry about that very much. Even with 8192*2048 world (4 screens wide and 2 screens tall, which is very big for Worms-style game) you would require only 8*2*4=64Mb (add mipmaps, other textures, framebuffer) you should fit into 128MB bounds. As far as I know even older GPUs have that kind of memory (we don't speak about GeForce4 cards, right?).
Older GPUs may have limitation on how big each texture could be, but since you already split your world into 512x512 chunks it won't be a problem.
If video memory becomes an issue you could allow users to use half-sized textures (i.e. downsample the world to 4096*1024 and 256x256 chinks) and fetch new / discard unused regions on demand.
With 32-bpp (4 bytes) you get 4*512*512 = 1 MB
See this regarding texture compression: http://www.oldunreal.com/editing/s3tc/ARB_texture_compression.pdf
Again, this depends on your engine, but if I were you I would do this:
Since your terrain texture will probably be reusing some mosaic-like textures, and you need to know whether a pixel is present, or destroyed, then given you are using mosaic textures no larger than 256x256 you could definitely get away with an GL_RG16 internal format (where each component would be a texture coordinate that you would need to map from [0, 255] -> [0.0, 1.0] and you would reserve some special value to indicate that the terrain is destroyed) for your terrain texture, making every 512x512 block take up 0.5MB.
Although it's temping to add an extra byte to indicate terrain presence, but a 3 byte format wouldn't cache too well