Converting RGBA to ARGB (glReadPixels -> AVAssetWriter) - ios

I want to record images, rendered with OpenGL, into a movie-file with the help of AVAssetWriter. The problem arises, that the only way to access pixels from an OpenGL framebuffer is by using glReadPixels, which only supports the RGBA-pixel format on iOS. But AVAssetWriter doesn't support this format. Here I can either use ARGB or BGRA. As the alpha-values can be ignored, I came to the conclusion, that the fastest way to convert RGBA to ARGB would be to give glReadPixels the buffer shifted by one byte:
UInt8 *buffer = malloc(width*height*4+1);
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, buffer+1);
The problem is, that the glReadPixels call leads to a EXC_BAD_ACCESS crash. If I don't shift the buffer by one byte, it works perfectly (but obviously with wrong colors in the video-file). What's the problem here?

I came to the conclusion, that the fastest way to convert RGBA to ARGB would be to give glReadPixels the buffer shifted by one byte
This will however shift your alpha values by 1 pixel as well. Here's another suggestion:
Render the picture to a texture (using a FBO with that texture as color attachment). Next render that texture to another framebuffer, with a swizzling fragment shader:
#version ...
uniform sampler2D image;
uniform vec2 image_dim;
void main()
{
// we want to address texel centers by absolute fragment coordinates, this
// requires a bit of work (OpenGL-ES SL doesn't provide texelFetch function).
gl_FragColor.rgba =
texture2D(image, vec2( (2*gl_FragCoord.x + 1)/(2*image_dim.y),
(2*gl_FragCoord.y + 1)/(2*image_dim.y) )
).argb; // this swizzles RGBA into ARGB order if read into a RGBA buffer
}

What happens if you put an extra 128 bytes of slack on the end of your buffer? It might be that OpenGL is trying to fill 4/8/16/etc bytes at a time for performance, and has a bug when the buffer is non-aligned or something. It wouldn't be the first time a performance optimization in OpenGL had issues on an edge case :)

Try calling
glPixelStorei(GL_PACK_ALIGNMENT,1)
before glReadPixels.
From the docs:
GL_PACK_ALIGNMENT
Specifies the alignment requirements for the start of each pixel row in memory.
The allowable values are
1 (byte-alignment),
2 (rows aligned to even-numbered bytes),
4 (word-alignment), and
8 (rows start on double-word boundaries).
The default value is 4 (see glGet). This often gets mentioned as a troublemaker in various "OpenGL pitfalls" type lists, although this is generally more to do with its row padding effects than buffer alignment.
As an alternative approach, what happens if you malloc 4 extra bytes, do the glReadPixels as 4-byte aligned starting at buffer+4, and then pass your AVAssetWriter buffer+3 (although I've no idea whether AVAssetWriter is more tolerant of alignment issues) ?

You will need to shift bytes by doing a memcpy or other copy operation. Modifying the pointers will leave them unaligned, which may or may not be within the capabilities of any underlying hardware (DMA bus widths, tile granularity, etc.)

Using buffer+1 will mean the data is not written at the start of your malloc'd memory, but rather one byte in, so it will be writing over the end of your malloc'd memory, causing the crash.
If iOS's glReadPixels will only accept GL_RGBA then you'll have to go through and re-arrange them yourself I think.
UPDATE, sorry I missed the +1 in your malloc, StilesCrisis is probably right about the cause of the crash.

Related

How to handle 3d texture on webgl2

I am trying to work with 3D texture in webgl2 and I came to know about the
gl.texImage3D();
I have experience with 2d texture and I found it very convenient but there is another approach that people are using on the internet.
gl.texStorage3D()
and then,
gl.texSubImage3D() // with all offset of x,y and z as 0.
I just want to know what is the difference between the two approaches. I came to know that equivalent of the second option is available for the 2D texture as well but I don't use it to provide data to the target. I know that subimage is to create texture's subimage to the fragment shader but I don't understand what is the difference between two approaches.
The short answer is texStorage2D and texStorage3D allocate all of the texture memory up front. Where as texImage2D and texImage3D allocate one mip level at a time.
texSubImage2D and texSubImage3D do not allocate anything. They just copy data into a texture mip level that was previously allocated with one of the functions above.
As for why one or the other. texStorage2D and texStorage3D can immediately allocate memory on the GPU. texImage2D and texImage3D can not since they don't know the complete texture (all the mips) until you actually try to draw something with the texture. To put it another way, texStorage2D/3D might be more efficient where as texImage2D/3D is more flexible.
In order for a texture to actually be renderable, all the mip levels you are going to use need to be the same internal format and the correct sizes.
When you call texStorage2D/3D you tell the size of mip level 0 (the largest level) and how many mip levels in total to allocate. So let's say you tell it an internal format of gl.RGBA8, width and height of 8 and 4 mip levels.
gl.texStorage2D(gl.TEXTURE_2D,
4, // 4 levels
gl.RGBA8, // internal format
8, // width
8); // height
It will allocate 8x8x4, 4x4x4, 2x2x4, 1x1x4 mip levels, all 4 mip levels. It knows they are all RGBA8. It knows they are all the correct size. Textures allocated with texStorage2D can't be changed in size or internal format. If you try to call texImage2D on a texture created with texStorage2D you'll get an error.
If you instead used texImage2D well first you probably specify the first mip
gl.texImage2D(gl.TEXTURE_2D,
0, // mip level
gl.RGBA8, // internal format
8, // width
8, // height
0, // border
gl.RGBA, // data format
gl.UNSIGNED_BYTE, // data type
data);
so now you have just 1 mip level, level #0. Will you add the other 3 mips? Will they be the correct size? Will those other 3 mips have the same internal format? Will you change mip level #0 to something else, a different size, or different internal format? WebGL doesn't have any idea what your next command will be, it has to wait until you actually try to draw with the texture before it can check. With texStorage you decide the sizes and formats of all the mips up front so it only has to check one time. With texImage you don't tell it everything up front so it has to check at draw time again if things change.

How to use shared memory between GPU and CPU on iOS with Metal? (ideally with objective c)

I created a MTLTexture like this:
descTex=[MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm width:texCamWidth height:texCamHeight mipmapped:NO];
descTex.usage = MTLTextureUsageShaderWrite | MTLTextureUsageShaderRead ;
[descTex setStorageMode:MTLStorageModeShared];
texOutRGB = [_device newTextureWithDescriptor:descTex];
Used a compute shader to fill the texture and render it to the screen. Results are as expected.
Now I need to do a CPU hook to modify the texture data which can not be done with a shader. I expected that the MTLTexture.buffer contents would allow me to loop over the pixels but it appears it does not work like that. I see people using the getBytes and then replaceRegion to write it back but that does not look like using shared memory since a copy of the data is made.
How to loop over the RGBA pixel data in the texture with the CPU?
If you created a simple 1D buffer instead, then just access the contents member as a pointer. If you need a RGBA buffer then create a CVPixelBuffer that contains BGRA pixels, you can then access the pixels by locking and then read/write to the base pointer for the buffer (take care to respect row widths), finally you can wrap the CVPixelBuffer as a metal texture to avoid the memcpy(). The 2D processing is not trivial, it is a lot easier to just use a 1D buffer.

How does this code find the memory aligned size of a Struct in swift? Why does it need binary operations?

I am going through the Metal iOS Swift example trying to understand the triple buffering practice they suggest. This is shown inside of the demo for the uniform animations.
As I understand it aligned memory simply starts at a specefic increment that is a multiple of some byte amount that the device really likes. My confusion is this line of code
// The 256 byte aligned size of our uniform structure
let alignedUniformsSize = (MemoryLayout<Uniforms>.size & ~0xFF) + 0x100
they use it to find the size and byte of the Uniforms struct. I am confused about why there are binary operations here I am really not sure what they do.
If it helps this aligned size is used to create a buffer like this. I am fairly sure that buffer allocates byte aligned memory automatically and is henceforth used as the memory storage location for the uniforms.
let buffer = self.device.makeBuffer(length:alignedUniformsSize * 3, options:[MTLResourceOptions.storageModeShared])
So essentially rather than going through the trouble of allocating byte aligned memory by yourself they let metal do it for them.
Is there any reason that the strategy they used when they did let allignedUniformsSize = would not work for other types such as Int or Float etc?
Let's talk first about why you'd want aligned buffers, then we can talk about the bitwise arithmetic.
Our goal is to allocate a Metal buffer that can store three (triple-buffered) copies of our uniforms (so that we can write to one part of the buffer while the GPU reads from another). In order to read from each of these three copies, we supply an offset when binding the buffer, something like currentBufferIndex * uniformsSize. Certain Metal devices require these offsets to be multiples of 256, so we instead need to use something like currentBufferIndex * alignedUniformsSize as our offset.
How do we "round up" an integer to the next highest multiple of 256? We can do it by dropping the lowest 8 bits of the "unaligned" size, effectively rounding down, then adding 256, which gets us the next highest multiple. The rounding down part is achieved by bitwise ANDing with the 1's complement (~) of 255, which (in 32-bit) is 0xFFFFFF00. The rounding up is done by just adding 0x100, which is 256.
Interestingly, if the base size is already aligned, this technique spuriously rounds up anyway (e.g., from 256 to 512). For the cost of an integer divide, you can avoid this waste:
let alignedUniformsSize = ((MemoryLayout<Uniforms>.size + 255) / 256) * 256

Reading RGB8 buffer from OpenGL ES 3.0 on iOS?

I really need to get an RGB 8 bytes per channel buffer from the GPU.
I need it to pass to a trained convolutional neural network, and it only accepts data in that format.
I can't convert it on the CPU as I'm heavily CPU bound and it's quite slow.
I currently have FBO with a renderbuffer attached, which is defined with:
glRenderbufferStorage(GL_RENDERBUFFER, GL_RGB8_OES, bufferWidth, bufferHeight);
There are no errors when I bind, define and render to the buffer.
But when I use
glReadPixels(0, 0, bufferWidth, bufferHeight, GL_RGB, GL_UNSIGNED_BYTE, rgbBufferRawName);
it gives an invalid enum error (0x0500). It works just fine when I pass GL_RED_EXT or GL_RGBA and produces correct buffers (I've checked it by uploading those buffers to a texture and rendering them, and they looked correct).
I tried setting glPixelStorei(GL_PACK_ALIGNMENT, 1); but that made no difference.
I'm on iOS10 and iPhone 6. I was doing ES2.0, but now tried switching to ES3.0 in hopes that it will help me solve the problem. It did not.
I would really appreciate help in getting RGB8 buffer in any way,
Thanks.
According the OpenGL 3.0 specification, GL_RGB is not a valid value for format.
https://www.khronos.org/opengles/sdk/docs/man3/html/glReadPixels.xhtml
You may want to either convert it to RGB after retrieving the GL_RGBA formatted buffer, or adjusting your algorithm to compensate for RGBA.

Packing Pixel Data in OpenCV

Whenever i read a colored image with 3 channels via cv::imread; its data alignment is a bit awkward (neither a byte nor an integer) and slows me down when i read a single pixel data on GPU memory. And it seems cv::Mat class's logic behind the alignment is a bit different than what i had initially thought. It does not add an extra byte between two pixels in a single row in order to have each pixel in a row started at every 4 bytes; but rather it pads some extra bytes at the END of each row for which any row may start at every 4 bytes boundary.
What should i do to pack each pixel data into a single unsigned integer? Is there a built-in method in OpenCV so that i do not have to use logical OR operation for packing each pixel data one by one?
Kind Regards.
You can convert the pixel format from BGR to BGRA
See this example.

Resources