How do you do hardware accelerated texture projection in Metal? I cannot find any reference or resource that describes how to do it.
You just do the divide yourself.
OpenGL:
a = tex2Dproj( texture, texcoord.xyzw )
b = tex2Dproj( texture, texcoord.xyz )
Metal equivilent:
a = texture.sample( sampler, texcoord.xy/texcoord.w )
b = texture.sample( sampler, texcoord.xy/texcoord.z )
(Choose 'a' or 'b' depending on the type of projection you are doing, more commonly it is 'a')
Related
I'm using Lua for the first time, and of course need to check around to learn how to implement certain code.
To create a vertex in Gideros, there's this code:
mesh:setVertex(index, x, y)
However, I would also like to use the z coordinate.
I've been checking around, but haven't found any help. Does anyone know if Gideros has a method for this, or are there any tips and tricks on setting the z coordinates?
First of all these functions are not provided by Lua, but by the Gideros Lua API.
There are no meshes or things like that in native Lua.
Referring to the reference Gideros Lua API reference manual would give you some valuable hints:
http://docs.giderosmobile.com/reference/gideros/Mesh#Mesh
Mesh can be 2D or 3D, the latter expects an additionnal Z coordinate
in its vertices.
http://docs.giderosmobile.com/reference/gideros/Mesh/new
Mesh.new([is3d])
Parameters:
is3d: (boolean) Specifies that this mesh
expect Z coordinate in its vertex array and is thus a 3D mesh
So in order to create a 3d mesh you have to do something like:
local myMesh = Mesh.new(true)
Although the manual does not say that you can use a z coordinate in setVertex
http://docs.giderosmobile.com/reference/gideros/Mesh/setVertex
It is very likely that you can do that.
So let's have a look at Gideros source code:
https://github.com/gideros/gideros/blob/1d4894fb5d39ef6c2375e7e3819cfc836da7672b/luabinding/meshbinder.cpp#L96-L109
int MeshBinder::setVertex(lua_State *L)
{
Binder binder(L);
GMesh *mesh = static_cast<GMesh*>(binder.getInstance("Mesh", 1));
int i = luaL_checkinteger(L, 2) - 1;
float x = luaL_checknumber(L, 3);
float y = luaL_checknumber(L, 4);
float z = luaL_optnumber(L, 5, 0.0);
mesh->setVertex(i, x, y, z);
return 0;
}
Here you can see that you can indeed provide a z coordinate and that it will be used.
So
local myMesh = Mesh.new(true)
myMesh:SetVertex(1, 100, 20, 40)
should work just fine.
You could have simply tried that btw. It's for free, it doesn't hurt and it's the best way to learn!
I am working on water simulation, I need to sample _CameraDepthTexture to get Opaque depth, it works well on Windows. But the shader get different depth on IOS.
vert:
o.pos = mul (UNITY_MATRIX_MVP, v.vertex);
o.ref = ComputeScreenPos(o.pos);
COMPUTE_EYEDEPTH(o.ref.z);
frag:
uniform sampler2D_float _CameraDepthTexture;
float raw_depth = UNITY_SAMPLE_DEPTH(tex2Dproj(_CameraDepthTexture, UNITY_PROJ_COORD(uv2)));
On windows, the raw_depth is around 0.98, but On IOS, The raw_depth is around 0.51.
I guess this result interrelate with MVP in difference Platform.
I am doing a 6-dof transformation with the RANSAC given in OpenCV and I now want to convert two matrices of cv::Mat to an Isometry3d of Eigen but I didn't find good examples about this problem.
e.g.
cv::Mat rot;
cv::Mat trsl;
// the rot is 3-by-3 and trsl is 3-by-1 vector.
Eigen::Isometry3d trsf;
trsf.rotation = rot;
trsf.translation = trsl; // I know trsf has two members but it seems not the correct way to do a concatenation.
Anyone give me a hand? Thanks.
Essentially, you need an Eigen::Map to read the opencv data and store it to parts of your trsf:
typedef Eigen::Matrix<double, 3, 3, Eigen::RowMajor> RMatrix3d;
Eigen::Isometry3d trsf;
trsf.linear() = RMatrix3d::Map(reinterpret_cast<const double*>(rot.data));
trsf.translation() = Eigen::Vector3d::Map(reinterpret_cast<const double*>(trsl.data));
You need to be sure that rot and trsl indeed hold double data (perhaps consider using cv::Mat_<double> instead).
My textures consist of 4 different colors. I want to change each color to a different color. I tried it the following way:
precision mediump float;
varying lowp vec4 vColor;
varying highp vec2 vUv;
uniform sampler2D texture;
bool inRange( float c1, float c2 ) {
return abs( c1 - c2 ) < 0.01;
}
void main() {
vec4 c = texture2D(texture, vUv);
if ( inRange( c.r, 238.0/255.0 ) && inRange( c.g, 255.0/255.0 ) && inRange( c.b, 84.0/255.0 ) )
c = vec4( 254.0/255.0, 254.0/255.0, 247.0/255.0, 1.0 );
else if ( inRange( c.r, 15.0/255.0 ) && inRange( c.g, 59.0/255.0 ) && inRange( c.b, 5.0/255.0 ) )
c = vec4( 65.0/255.0, 65.0/255.0, 65.0/255.0, 1.0 );
else if ( inRange( c.r, 157.0/255.0 ) && inRange( c.g, 184.0/255.0 ) && inRange( c.b, 55.0/255.0 ) )
c = vec4( 254.0/255.0, 247.0/255.0, 192.0/255.0, 1.0 );
else if ( inRange( c.r, 107.0/255.0 ) && inRange( c.g, 140.0/255.0 ) && inRange( c.b, 38.0/255.0 ) )
c = vec4( 226.0/255.0, 148.0/255.0, 148.0/255.0, 1.0 );
gl_FragColor = c;
}
This works. But it's terribly slow. I'm running this on an iPhone, but the calculations aren't that hard or am I missing something?
Is there a faster way to do this?
Branches are bad for shader performance. Normally, the GPU executes multiple fragment shaders (each for their own fragment) at once. They all run in lockstep -- SIMD processing means that in effect all parallel fragment processors are running the same code but operating on different data. When you have conditionals, it's possible for different fragments to be on different code paths, so you lose SIMD parallelism.
One of the best performance tricks for this sort of application is using a Color Lookup Table. You provide a 3D texture (the lookup table) and use the GLSL texture3D function to look up into it -- the input coordinates are the R, G, and B values of your original color, and the output is the replacement color.
This is very fast, even on mobile hardware -- the fragment shader doesn't have to do any computation, and the texture lookup is usually cached before the fragment shader even runs.
Constructing a lookup table texture is easy. Conceptually, it's cube that encodes every possible RGB value (x axis is R from 0.0 to 1.0, y axis is G, z axis is B). If you organize it as a 2D image, you can then open it in your favorite image editor and apply any color transformation filters you like to it. The filtered image is your conversion lookup table. There's a decent writeup on the technique here and another in GPU Gems 2. A more general discussion of the technique, applied using Core Image filters, is in Apple's documentation library.
EDIT: It was confirmed by the asker that it is the presence of any branches that causes the incredible slowdown. I will provide an attempt at a branchless solution.
Well, if branches (including using the ternary "?" operator) are unusable, you can only use arithmetic.
A possible solution (which is hideous from a maintenance perspective, but might fit your need) is to map your input color to output color using polynomials that give desired output for the 4 colors you care about. I treated the 3 RGB color channels separately and plugged in the input/output points into wolfram alpha with a cubic fit (example for the red channel here: http://www.wolframalpha.com/input/?i=cubic+fit+%7B238.0%2C+254.0%7D%2C%7B15.0%2C+65.0%7D%2C+%7B157.0%2C+254.0%7D%2C+%7B107.0%2C+226.0%7D). You could use any polynomial fit program for this purpose.
The code for the red channel is then:
float redResult = 20.6606 + 3.15457 * c.r - 0.0135167 * c.r*c.r + 0.0000184102 c.r*c.r*c.r
Rinse and repeat the process with the green and blue color channels and you have your shader. Note that you may want to specify the very small coefficients in scientific notation to retain accuracy... I don't know how your particular driver handles floating-point literals.
Even then you may (probably) have precision issues, but its worth a shot.
Another possibility is using an approximate Bump Function (I say approximate, since you don't actually care about the smoothness constraints). You just want a value thats 1 at the color you care about and 0 everywhere else far enough away. Say you have a three-component bump function: bump3 that takes a vec3 for the location of the bump and a vec3 for the location to evaluate the function at. Then you can rewrite one of your first conditional from:
if ( inRange( c.r, 238.0/255.0 ) && inRange( c.g, 255.0/255.0 ) && inRange( c.b, 84.0/255.0 ) )
c = vec4( 254.0/255.0, 254.0/255.0, 247.0/255.0, 1.0 );
to:
vec3 colorIn0 = vec3(238.0/255.0, 255.0/255.0, 84.0/255.0);
vec3 colorOut0 = vec3(254.0/255.0, 254.0/255.0, 247.0/255.0)
result.rgb = c.rgb + bump3(colorIn0, c.rgb)) * (colorOut0-colorIn0);
If max/min are fast on your hardware (they might be full branches under the hood :( ), a possible quick and dirty bump3() implementation might be:
float bump3(vec3 b, vec3 p) {
vec3 diff = abs(b-p);
return max(0.0, 1.0 - 255.0*(diff.x + diff.y + diff.z));
}
Other possibilities for bump3 might be abusing smoothstep (again, if is fast on your hardware) or using the exponential.
The polynomial approach has the added (incidental) benefit of generalizing your map to more than just the four colors, but requires many arithmetic operations, is a maintenance nightmare, and likely suffers from precision issues. The bump function approach, on the other hand, should produce the same results as your current shader, even on input that is not one of those four colors, and is much more readable and maintainable (adding another color pair is trivial, compared to the polynomial approach). However, in the implementation I gave, it uses a max, which might be a branch under the hood (I hope not, geez).
Original answer below
It would be good to know how you are getting timing information so we can be sure its this shader thats slow (you could test this by just making this a pass-through shader as a quick hack... I recommend getting used to using a profiler though). It seem exceedingly odd that such a straightforward shader is slow.
Otherwise, if your texture truly only has those 4 colors (and it is guaranteed), then you can trivially take the number of inRange calls down from 12 to 3 by removing the if from the last branch (just make it an else), and then only testing the r value of c. I don't know how the iPhone's glsl optimizer works, but then you could further try to replace the if statements with ternary operators and see if that makes a difference. Those are the only changes I can think of and unfortunately you can't do the definite optimization if your textures aren't guaranteed to only have those 4 colors.
I would again like to point out that you should make sure this shader is causing the slowdown before trying to optimize.
Hi I am trying to do nearest neighbor queries on integer data.
It seems that cv::flann does not support this. Is this true?
Yes, it is possible to use FLANN nearest neighbor searches on integer data. You need to use a distance measure for integers. Some distance measures are templates, parameterized on data type (as in the example below), others have hard coded types (e.g. HammingLUT has unsigned char element type and int result (distance) type). You can also implement your own distance measure, see <opencv2/flann/dist.h> for details.
Example - a quote from the code that uses unsigned char data:
// we use euclidean distances on unsigned chars:
typedef cv::flann::L2<unsigned char> Distance_U8;
cv::flann::GenericIndex< Distance_U8 > * m_flann;
// ...
// we have 3d features
cv::Mat features( features_count, 3, CV_8UC1 );
// ... fill the features matrix ...
// ... build the index ...
m_flann = new cv::flann::GenericIndex< Distance_U8 > (features, params);
// ...
// how many neighbours per query?
in knn = 5;
// search params - see documentation
cvflann::SearchParams params;
// prepare the matrices
// query data - unsigned chars, 3d (like features)
cv::Mat input_1( n_pixels, 3, CV_8UC1 ),
// indices into features array - integers
indices_1( n_pixels, knn, CV_32S ),
// distances - floats (even with integer data distances are floats)
dists_1( n_pixels, knn, CV_32F );
m_flann->knnSearch( input_1, indices_1, dists_1, 1, params);
No, FLANN is for float descriptors only. Although poorly documented the OpenCV set of matchers and descriptors must be used carefully.
There is a bug report on the ros trac explaining in more detail, but basically descriptors and matchers only handle certain types of data, and this must be respected. I've included an extract from the previously mentioned page here for reference:
Descriptors:
float descriptors: SIFT, SURF
uchar descriptors: ORB BRIEF
Matchers:
for float descriptor: FlannBased BruteForce BruteForce-L1
for uchar descriptor: BruteForce-Hamming BruteForce-HammingLUT