I am trying to use the GPUImage library for image processing (not for image output to screen). By default, the library outputs one BGRA texture. I would like to instead output multiple single-channel/single-byte textures. Up to this point I have been bit-packing multiple calculations for each pixel in BGRA. I have reached the limitations of such a method because a) I now have greater than 4 return values for each pixel, and b) the overhead of de-interlacing BGRA-BGRA-BGRA-BGRA..... into BBBB..,GGGG..,RRRR..,AAAA.., is starting to really bog down my program.
I know there is some sample code for using multiple input textures with GPUImage, but I have not seen anything for multiple output textures. For single-byte output I believe I could use GL_ALPHA textures(?), so I am guessing it is a matter to binding multiple textures into variables in my filter kernel.
Thanks!
Probably what you need is Multiple Rendering Targets (or MRT for short). You can inspect how many color attachments does you target hardware supports with GL_COLOR_ATTACHMENT. Unfortunely I believe not all iOS devices support, if any.
Related
I am building an iOS app that renders frames from the camera to Metal Textures in real time. I want to use CoreML to perform style transfer on subregions of the metal texture (imagine the camera output as a 2x2 grid, where each of the 4 squares is used as input to a style transfer network, and the output pasted back into the displayed texture). I am trying to figure out how to best use CoreML inside of a Metal pipeline to fill non-overlapping subregions of the texture with the output of the mlmodel(hopefully without decomposing the mlmodel into an MPSNNGraph). Is it possible to feed a MTLTexture or MTLBuffer to a coreML model directly? I'd like to avoid format conversions as much as possible (for speed).
My mlmodel takes CVPixelBuffers at its inputs and outputs. Can it be made to take MTLTextures instead?
The first thing I tried was: cutting the given sample buffer into subregions (by copying the pixel data ugh), inferring on each subregion, and then pasting them together into a new sample buffer which was then turned into a MTLTexture and displayed. This approach did not take advantage of metal at all, as the textures were not created until after inference. It also had alot of circuitous conversions/copy/paste operations that slow everything down.
The second thing I tried was: send the camera data to the MTLTexture directly, infer on subregions of the sample buffer, paste into the current displayed texture with MTLTexture.replace(region:...withBytes:) for each subregion. However, MTLTexture.replace() uses the cpu and is not fast enough for live video.
The idea I am about to try is: convert my mlmodel to an MPSNNGraph, get frames as textures, use the MPSNNGraph for inference on subregions, and display the output. I figured i'd check here before going through all of the effort of converting the mlmodel first though. Sorry if this is too broad, I mainly work in tensorflow and am a bit out of my depth here.
Is it possible to process an MTLTexture in-place without osx_ReadWriteTextureTier2?
It seems like I can set two texture arguments to be the same texture. Is this supported behavior?
Specifically, I don't mind not having texture caching update after a write. I just want to in-place (and sparsely) modify a 3d texture. It's memory prohibitive to have two textures. And it's computationally expensive to copy the entire texture, especially when I might only be updating a small portion of it.
Per the documentation, regardless of feature availability, it is invalid to declare two separate texture arguments (one read, one write) in a function signature and then set the same texture for both.
Any Mac that supports osx_GPUFamily1_v2 supports function texture read-writes (by declaring the texture with access::read_write).
The distinction between "Tier 1" (which has no explicit constant) and osx_ReadWriteTextureTier2 is that the latter supports additional pixel formats for read-write textures.
If you determine that your target Macs don't support the kind of texture read-writes you need (because you need to deploy to OS X 10.11 or because you're using an incompatible pixel format for the tier of machine you're deploying to), you could operate on your texture one plane at a time, reading from your 3D texture, writing to a 2D texture, and then blitting the result back into the corresponding region in your 3D texture. It's more work, but it'll use much less than double the memory.
I am using openGLES on IOS to do some general data processing. Currently I am trying to make a large lookup table (~1M elements) of float values accessed by integer indexes, and I would like it to be 1D (though 2D works). I have learnt that using texture/sampler is probably the way to do that, but my remaining questions are:
Sampler or Texture, which is more efficient? What would be the parameter settings to achieve the optimal results (like those configured in glTexParameteri())?
I know I can use 1-sample-high 2D sampler/texture as 1D, but being out of curiosity, I wonder if the 1D sampler/texture is removed on IOS es3? I cannot find the method glTexImage2D() nor parameters GL_TEXTURE_1D with ES3/gl.h imported.
OpenGL ES does not have 1D textures. Never did in any previous version, and still doesn't up to the most recent version (3.2). And I very much doubt it ever will.
At least in my opinion, that's no big loss. You can do anything you could have done with a 1D texture using a 2D texture of height 1. The only minor inconvenience is that you have to pass in some more sampling attributes, and a second texture coordinate when you sample the texture in your GLSL code.
For the sizes you're looking at, you'll have the same problem with a 2D texture of height 1 that you would have faced with 1D textures as well: You're limited by the maximum texture size. This is given by the value you can query with glGetIntegerv(GL_MAX_TEXTURE_SIZE, ...). Typical values for relatively recent mobile platforms are 2K to 8K. Based on the published docs, it looks like the limit is 4096 on recent Apple platforms (A7 to A9).
There is nothing I can think of that would give you a much larger range in a single dimension. There is a EXT_texture_buffer extension that targets your use case, but I don't see it in the list of supported extensions for iOS.
So the best you can probably do is store the data in a 2D texture, and use div/mod arithmetic to split your large 1D index into 2 texture coordinates.
I'm trying to find the most efficient way of handling multi-texturing in OpenGL ES2 on iOS. By 'efficient' I mean the fastest rendering even on older iOS devices (iPhone 4 and up) - but also balancing convenience.
I've considered (and tried) several different methods. But have run into a couple of problems and questions.
Method 1 - My base and normal values are rgb with NO ALPHA. For these objects I don't need transparency. My emission and specular information are each only one channel. To reduce texture2D() calls I figured I could store the emission as the alpha channel of the base, and the specular as the alpha of the normal. With each being in their own file it would look like this:
My problem so far has been finding a file format that will support a full non-premultiplied alpha channel. PNG just hasn't worked for me. Every way that I've tried to save this as a PNG premultiplies the .alpha with the .rgb on file save (via photoshop) basically destroying the .rgb. Any pixel with a 0.0 alpha has a black rgb when I reload the file. I posted that question here with no activity.
I know this method would yield faster renders if I could work out a way to save and load this independent 4th channel. But so far I haven't been able to and had to move on.
Method 2 - When that didn't work I moved on to a single 4-way texture where each quadrant has a different map. This doesn't reduce texture2D() calls but it reduces the number of textures that are being accessed within the shader.
The 4-way texture does require that I modify the texture coordinates within the shader. For model flexibility I leave the texcoords as is in the model's structure and modify them in the shader like so:
v_fragmentTexCoord0 = a_vertexTexCoord0 * 0.5;
v_fragmentTexCoord1 = v_fragmentTexCoord0 + vec2(0.0, 0.5); // illumination frag is up half
v_fragmentTexCoord2 = v_fragmentTexCoord0 + vec2(0.5, 0.5); // shininess frag is up and over
v_fragmentTexCoord3 = v_fragmentTexCoord0 + vec2(0.5, 0.0); // normal frag is over half
To avoid dynamic texture lookups (Thanks Brad Larson) I moved these offsets to the vertex shader and keep them out of the fragment shader.
But my question here is: Does reducing the number of texture samplers used in a shader matter? Or would I be better off using 4 different smaller textures here?
The one problem I did have with this was bleed over between the different maps. A texcoord of 1.0 was was averaging in some of the blue normal pixels due to linear texture mapping. This added a blue edge on the object near the seam. To avoid it I had to change my UV mapping to not get too close to the edge. And that's a pain to do with very many objects.
Method 3 would be to combine methods 1 and 2. and have the base.rgb + emission.a on one side and normal.rgb + specular.a on the other. But again I still have this problem getting an independent alpha to save in a file.
Maybe I could save them as two files but combine them during loading before sending it over to openGL. I'll have to try that.
Method 4 Finally, In a 3d world if I have 20 different panel textures for walls, should these be individual files or all packed in a single texture atlas? I recently noticed that at some point minecraft moved from an atlas to individual textures - albeit they are 16x16 each.
With a single model and by modifying the texture coordinates (which I'm already doing in method 2 and 3 above), you can easily send an offset to the shader to select a particular map in an atlas:
v_fragmentTexCoord0 = u_texOffset + a_vertexTexCoord0 * u_texScale;
This offers a lot of flexibility and reduces the number of texture bindings. It's basically how I'm doing it in my game now. But IS IT faster to access a small portion of a larger texture and have the above math in the vertex shader? Or is it faster to repeatedly bind smaller textures over and over? Especially if you're not sorting objects by texture.
I know this is a lot. But the main question here is what's the most efficient method considering speed + convenience? Will method 4 be faster for multiple textures or would multiple rebinds be faster? Or is there some other way that I'm overlooking. I see all these 3d games with a lot of graphics and area coverage. How do they keep frame rates up, especially on older devices like the iphone4?
**** UPDATE ****
Since I've suddenly had 2 answers in the last few days I'll say this. Basically I did find the answer. Or AN answer. The question is which method is more efficient? Meaning which method will result in the best frame rates. I've tried the various methods above and on the iPhone 5 they're all just about as fast. The iPhone5/5S has an extremely fast gpu. Where it matters is on older devices like the iPhone4/4S, or on larger devices like a retina iPad. My tests were not scientific and I don't have ms speeds to report. But 4 texture2D() calls to 4 RGBA textures was actually just as fast or maybe even faster than 4 texture2d() calls to a single texture with offsets. And of course I do those offset calculations in the vertex shader and not the fragment shader (never in the fragment shader).
So maybe someday I'll do the tests and make a grid with some numbers to report. But I don't have time to do that right now and write a proper answer myself. And I can't really checkmark any other answer that isn't answering the question cause that's not how SO works.
But thanks to the people who have answered. And check out this other question of mine that also answered some of this one: Load an RGBA image from two jpegs on iOS - OpenGL ES 2.0
Have a post process step in your content pipeline where you merge your rgb with alpha texture and store it in a. Ktx file when you package the game or as a post build event when you compile.
It's fairly trivial format and would be simple to write such command-line tool that loads 2 png's and merges these into one Ktx, rgb + alpha.
Some benefits by doing that is
- less cpu overhead when loading the file at game start up, so the games starts quicker.
- Some GPUso does not natively support rgb 24bit format, which would force the driver to internally convert it to rgba 32bit. This adds more time to the loading stage and temporary memory usage.
Now when you got the data in a texture object, you do want to minimize texture sampling as it means alot of gpu operations and memory accesses depending on filtering mode.
I would recommend to have 2 textures with 2 layers each since there's issues if you do add all of them to the same one is potential artifacts when you sample with bilinear or mipmapped as it may include neighbour pixels close to edge where one texture layer ends and the second begins, or if you decided to have mipmaps generated.
As an extra improvement I would recommend not having raw rgba 32bit data in the Ktx, but actually compressing it into a dxt or pvrtc format. This would use much less memory which means faster loading times and less memory transfers for the gpu, as memory bandwidth is limited.
Of course, adding the compressor to the post process tool is slightly more complex.
Do note that compressed textures do loose a bit of the quality depending on algorithm and implementation.
Silly question but are you sure you are sampler limited? It just seems to me that, with your "two 2-way textures" you are potentially pulling in a lot of texture data, and you might instead be bandwidth limited.
What if you were to use 3 textures [ BaseRGB, NormalRBG, and combined Emission+Specular] and use PVRTC compression? Depending on the detail, you might even be able to use 2bpp (rather than 4bpp) for the BaseRGB and/or Emission+Specular.
For the Normals I'd probably stick to 4bpp. Further, if you can afford the shader instructions, only store the R&G channels (putting 0 in the blue channel) and re-derive the blue channel with a bit of maths. This should give better quality.
In iOS, I have an input image, which I am rendering to an intermediate texture (using a framebuffer) and then rendering that texture to the iOS-supplied renderbuffer (also using a framebuffer, of course). This is 2D, so I'm just drawing a quad each time.
No matter what I've tried, I can't seem to get the second rendering operation to use GL_LINEAR on the texture (I'm using GL_NEAREST on the first). The only way I've seen something filtered is if both textures use GL_LINEAR. Extremely similar code (at least, the OpenGL bits) works fine on Android. Am I just doing something wrong, or does this not work on iOS?
The answer is "yes, you can use different filtering options with different texture units in the same context". I had different issues with my program.