My question is specifically in regards to Metal, since I don't know if the answer would change for another API.
What I believe I understand so far is this:
A mipmapped texture has precomputed "levels of detail", where lower levels of detail are created by downsampling the original texture in some meaningful way.
Mipmap levels are referred to in descending level of detail, where level 0 is the original texture, and higher-levels are power-of-two reductions of it.
Most GPUs implement trilinear filtering, which picks two neighboring mipmap levels for each sample, samples from each level using bilinear filtering, and then linearly blends those samples.
What I don't quite understand is how these mipmap levels are selected. In the documentation for the Metal standard library, I see that samples can be taken, with or without specifying an instance of a lod_options type. I would assume that this argument changes how the mipmap levels are selected, and there are apparently three kinds of lod_options for 2D textures:
bias(float value)
level(float lod)
gradient2d(float2 dPdx, float2 dPdy)
Unfortunately, the documentation doesn't bother explaining what any of these options do. I can guess that bias() biases some automatically chosen level of detail, but then what does the bias value mean? What scale does it operate on? Similarly, how is the lod of level() translated into discrete mipmap levels? And, operating under the assumption that gradient2d() uses the gradient of the texture coordinate, how does it use that gradient to select the mipmap level?
More importantly, if I omit the lod_options, how are the mipmap levels selected then? Does this differ depending on the type of function being executed?
And, if the default no-lod-options-specified operation of the sample() function is to do something like gradient2D() (at least in a fragment shader), does it utilize simple screen-space derivatives, or does it work directly with rasterizer and interpolated texture coordinates to calculate a precise gradient?
And finally, how consistent is any of this behavior from device to device? An old article (old as in DirectX 9) I read referred to complex device-specific mipmap selection, but I don't know if mipmap selection is better-defined on newer architectures.
This is a relatively big subject that you might be better off asking on https://computergraphics.stackexchange.com/ but, very briefly, Lance Williams' paper "Pyramidal Parametrics" that introduced trilinear filtering and the term "MIP mapping", has a suggestion that came from Paul Heckbert (see page three, 1st column) that I think may still be used, to an extent, in some systems.
In effect the approaches to computing the MIP map levels are usually based on the assumption that your screen pixel is a small circle and this circle can be mapped back onto the texture to get, approximately, an ellipse. You estimate the length of the longer axis expressed in texels of the highest resolution map. This then tells you which MIP maps you should sample. For example, if the length was 6, i.e. between 2^2 and 2^3, you'd want to blend between MIP map levels 2 and 3.
Related
I would like to use MPSImageGaussianPyramid but am very new to Metal's usage and with mipmaps. I would like to use the filter to produce an image pyramid for image processing techniques.
From what I'm able to gather, MPSImageGaussianPyramid creates a mipmapped image, however in my code I'm having a hard time even making sure that I'm seeing the output correctly. Are there any examples where this filter has been used correctly? My questions are:
How does one access the mipmapped images after the filter has been applied?
Is it possible to copy the mipmapped images to another image for processing?
Is this mipmapped image going to be faster than manually creating a pyramid through custom filters?
Thanks, and I will provide some sample code later that I've not been able to get working.
A few pieces of advice for working with MPS kernels in general, and the image pyramid filters in particular:
If you're going to be using a kernel more than once, cache it and reuse it rather than creating a kernel every time you need to encode.
Consider setting the edgeMode property of the kernel to .clamp when downsampling, since sampling out-of-bounds (as the Gaussian pyramid does on the first step) will return black by default and introduce artificially dark pixels.
When encoding a Gaussian pyramid kernel, always use the "in-place" method, without providing a fallback allocator:
kernel.encode(commandBuffer: commandBuffer, inPlaceTexture: &myTexture)
As you note, running an image pyramid kernel puts the result in the available mip levels of the texture being downsampled. This means that the texture you provide should already have as many mip levels allocated as you want filled. Thus, you should ensure that the descriptor you use to create your texture has an appropriate mipmapLevelCount (this is ensured by the texture2DDescriptorWithPixelFormat convenience method, and can be controlled indirectly by using the .allocateMipmaps option with MTKTextureLoader).
Assuming that you now know how to encode a kernel and get the desired results into your texture, here are some answers to your questions:
1. How does one access the mipmapped images after the filter has been applied?
You can implicitly use the mipmaps in a shader when rendering by using a sampler that has has a mip filter, or you can explicitly sample from a particular mip level by passing a lod_option parameter of type level to the sample function:
constexpr sampler mySampler(coord::normalized, filter::linear, mip_filter::linear);
float4 color = myTexture.sample(mySampler, texCoords, level(selectedLod))
This works in compute kernels as well as rendering functions. Use a mip filter of nearest or round the selected LOD if you want to sample from a single mip level rather than using trilinear mip filtering.
2. Is it possible to copy the mipmapped images to another image for processing?
Since a texture that is downsampled by an image pyramid kernel must already have the .pixelFormatView usage flag, you can create a texture view on a mipped texture that selects one or more mip levels. For example, if you want to select the first and higher mip levels (dropping the base level), you could do that like this:
let textureView = myTexture.makeTextureView(pixelFormat: myTexture.pixelFormat,
textureType: myTexture.textureType,
levels: Range<Int>(uncheckedBounds: (1, myTexture.mipmapLevelCount)),
slices: Range<Int>(uncheckedBounds: (0, 1)))
You can also use a blit command encoder to copy from one texture to another, specifying which mip levels to include. This allows you to free the original texture if you want to reclaim the memory used by the lower mip levels.
You can wrap a MTLTexture with an MPSImage if you want to use APIs that work with images rather than textures:
let image = MPSImage(texture: myTexture, featureChannels: 4)
3. Is this mipmapped image going to be faster than manually creating a pyramid through custom filters?
Almost certainly. Metal Performance Shaders are tuned for each generation of devices and have numerous heuristics that optimize execution speed and energy use.
I'm trying to implement fluid dynamics using compute shaders. In the article there are a series of passes done on a texture since this was written before compute shaders.
Would it be faster to do each pass on a texture or buffer? The final pass would have to be applied to a texture anyways.
I would recommend using whichever dimensionality of resource fits the simulation. If it's a 1D simulation, use a RWBuffer, if it's a 2D simulation use a RWTexture2D and if it's a 3D simulation use a RWTexture3D.
There appear to be stages in the algorithm that you linked that make use of bilinear filtering. If you restrict yourself to using a Buffer you'll have to issue 4 or 8 memory fetches (depending on 2D or 3D) and then more instructions to calculate the weighted average. Take advantage of the hardware's ability to do this for you where possible.
Another thing to be aware of is that data in textures is not laid out row by row (linearly) as you might expect, instead it's laid in such a way that neighbouring texels are as close to one another in memory as possible; this can be called Tiling or Swizzling depending on whose documentation you read. For that reason, unless your simulation is one-dimensional, you may well get far better cache coherency on reads/writes from a resource whose layout most closely matches the dimensions of the simulation.
I am making used of a Tamura texture feature extraction from a library (JFeatureLib).
Tamura feature is an approach that explores texture representation from a different angle since it is motivated by the psychological studies on human visual perception of textures. One of its most important feature is coarseness (the other being brightness and contrast).
I am having difficulties with understanding the real meaning of the coarseness feature. From the literature I found that a coarse texture consists of a small number of large primitives, while a fine texture contains a large amount of small primitives. A more precise definition might be:
The higher the coarseness value is, the rougher is the texture. If
there are two different textures, one macro texture of high coarseness
and one micro texture of low coarseness, the macro texture is
considered.
However, I cannot see any relation between the coarseness value and the roughness of the image.
Example: in my opinion the coarseness value of the images below should increase from left to right. However, I am getting the following coarseness values:
1.93155, 3.23740, 2.40476, 3.11979 (left to right).
I am finding it quite strange that coarseness_2 is higher than coarseness_3 and coarseness_4. Even worst, for the following images I am getting the coarseness values (almost the complete opposite):
7.631, 8.821, 9.0664, 10.564 (left to right)
I tested with several other other images..these are just two of them.
I know that my interpretation coarseness may be too literal, but again Tamura is said to extract (unlike many other features) in a way that correspond to the human visual system. Am I misunderstanding the real meaning of coarseness or is it a problem of accuracy with the Tamura feature that I am using?
Coarseness has a direct relationship to scale and repetition rates and was
seen by Tamura et al as the most fundamental texture feature. An image will
contain textures at several scales; coarseness aims to identify the largest size at which a texture exists, even where a smaller micro texture exists.
I'm developing an image warping iOS app with OpenGL ES 2.0.
I have a good grasp on the setup, the pipeline, etc., and am now moving along to the math.
Since my experience with image warping is nil, I'm reaching out for some algorithm suggestions.
Currently, I'm setting the initial vertices at points in a grid type fashion, which equally divide the image into squares. Then, I place an additional vertex in the middle of each of those squares. When I draw the indices, each square contains four triangles in the shape of an X. See the image below:
After playing with photoshop a little, I noticed adobe uses a slightly more complicated algorithm for their puppet warp, but a much more simplified algorithm for their standard warp. What do you think is best for me to apply here / personal preference?
Secondly, when I move a vertex, I'd like to apply a weighted transformation to all the other vertices to smooth out the edges (instead of what I have below, where only the selected vertex is transformed). What sort of algorithm should I apply here?
As each vertex is processed independently by the vertex shader, it is not easy to have vertexes influence each other's positions. However, because there are not that many vertexes it should be fine to do the work on the CPU and dynamically update your vertex attributes per frame.
Since what you are looking for is for your surface to act like a rubber sheet as parts of it are pulled, how about going ahead and implementing a dynamic simulation of a rubber sheet? There are plenty of good articles on cloth simulation in full 3D such as Jeff Lander's. Your application could be a simplification of these techniques. I have previously implemented a simulation like this in 3D. I required a force attracting my generated vertexes to their original grid locations. You could have a similar force attracting vertexes to the pixels at which they are generated before the simulation is begun. This would make them spring back to their default state when left alone and would progressively reduce the influence of your dragging at more distant vertexes.
What are the ways in which to quantify the texture of a portion of an image? I'm trying to detect areas that are similar in texture in an image, sort of a measure of "how closely similar are they?"
So the question is what information about the image (edge, pixel value, gradient etc.) can be taken as containing its texture information.
Please note that this is not based on template matching.
Wikipedia didn't give much details on actually implementing any of the texture analyses.
Do you want to find two distinct areas in the image that looks the same (same texture) or match a texture in one image to another?
The second is harder due to different radiometry.
Here is a basic scheme of how to measure similarity of areas.
You write a function which as input gets an area in the image and calculates scalar value. Like average brightness. This scalar is called a feature
You write more such functions to obtain about 8 - 30 features. which form together a vector which encodes information about the area in the image
Calculate such vector to both areas that you want to compare
Define similarity function which takes two vectors and output how much they are alike.
You need to focus on steps 2 and 4.
Step 2.: Use the following features: std() of brightness, some kind of corner detector, entropy filter, histogram of edges orientation, histogram of FFT frequencies (x and y directions). Use color information if available.
Step 4. You can use cosine simmilarity, min-max or weighted cosine.
After you implement about 4-6 such features and a similarity function start to run tests. Look at the results and try to understand why or where it doesnt work. Then add a specific feature to cover that topic.
For example if you see that texture with big blobs is regarded as simmilar to texture with tiny blobs then add morphological filter calculated densitiy of objects with size > 20sq pixels.
Iterate the process of identifying problem-design specific feature about 5 times and you will start to get very good results.
I'd suggest to use wavelet analysis. Wavelets are localized in both time and frequency and give a better signal representation using multiresolution analysis than FT does.
Thre is a paper explaining a wavelete approach for texture description. There is also a comparison method.
You might need to slightly modify an algorithm to process images of arbitrary shape.
An interesting approach for this, is to use the Local Binary Patterns.
Here is an basic example and some explanations : http://hanzratech.in/2015/05/30/local-binary-patterns.html
See that method as one of the many different ways to get features from your pictures. It corresponds to the 2nd step of DanielHsH's method.