Faster convolution on ios

Faster convolution on ios - ios

I'm trying to perform convolution on image with a 16X16 generated kernel. I used opencv filterengine class but it's only operating on the CPU and i'm trying to accelerate the app.
I know opencv also has filterengine_gpu but for my understanding it's not IOS supported.
GPUimage let you perform convolution with 3X3 generated filter. Is there any other way to accelerate the convolution? Different libary that operates on the GPU?

You can use Apple's Accelerate framework for this. It's available on iOS and MacOS bythe way, so may be reuse your code later.
In order to achieve best performance, you may need to consider the following options:
if your convolution kernel is separable, use a separable implementation. This is the case of symmetric kernels (such as Gaussian convolution). This will save yo an order of magnitude in computation time;
if your images have power-of-two sizes, consider using the FFT-trick. Convolution in the spatial domain (complexity N^2) is equivalent to a multiplication in the Fourier domain (complexity N). Thus, you can 1) FFT your image and kernel, 2) multiply term-by-term the result and 3) invert FFT of the result. Since FFT algorithms are fast (e.g., Aple's FFT in the Accelerate framework), this series of operations can result in performance boost.
You can find more insight on iOS image processing optimization in this book that I did also review here.

You can do a 16x16 convolution using GPUImage, but you'll need to write your own filter to do so. The 3x3 convolution in the framework samples from pixels in a 3x3 area around each pixel in the input image and applies the matrix of weights you feed in. The GPUImage3x3ConvolutionFilter.m source file within the framework should be reasonably easy to read, but I can provide a little context if you wish to step beyond what I have there.
The first thing I do is use the following vertex shader:
attribute vec4 position;
attribute vec4 inputTextureCoordinate;
uniform float texelWidth;
uniform float texelHeight;
varying vec2 textureCoordinate;
varying vec2 leftTextureCoordinate;
varying vec2 rightTextureCoordinate;
varying vec2 topTextureCoordinate;
varying vec2 topLeftTextureCoordinate;
varying vec2 topRightTextureCoordinate;
varying vec2 bottomTextureCoordinate;
varying vec2 bottomLeftTextureCoordinate;
varying vec2 bottomRightTextureCoordinate;
void main()
{
gl_Position = position;
vec2 widthStep = vec2(texelWidth, 0.0);
vec2 heightStep = vec2(0.0, texelHeight);
vec2 widthHeightStep = vec2(texelWidth, texelHeight);
vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight);
textureCoordinate = inputTextureCoordinate.xy;
leftTextureCoordinate = inputTextureCoordinate.xy - widthStep;
rightTextureCoordinate = inputTextureCoordinate.xy + widthStep;
topTextureCoordinate = inputTextureCoordinate.xy - heightStep;
topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep;
topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep;
bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep;
bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep;
bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep;
}
to calculate the positions from which to sample the pixel colors used in the convolution. Because normalized coordinates are used, the X and Y spacings between pixels are 1.0/[image width] and 1.0/[image height], respectively.
The texture coordinates for the pixels to be sampled are calculated in the vertex shader for two reasons: it's more efficient to do this calculation once per vertex (of which there are six in the two triangles that make up the rectangle of the image) than per each fragment (pixel), and to avoid dependent texture reads where possible. Dependent texture reads are where the texture coordinate to be read from is calculated in the fragment shader, not simply passed in from the vertex shader, and they are much slower on the iOS GPUs.
Once I have the texture locations calculated in the vertex shader, I pass them into the fragment shader as varyings and use the following code there:
uniform sampler2D inputImageTexture;
uniform mat3 convolutionMatrix;
varying vec2 textureCoordinate;
varying vec2 leftTextureCoordinate;
varying vec2 rightTextureCoordinate;
varying vec2 topTextureCoordinate;
varying vec2 topLeftTextureCoordinate;
varying vec2 topRightTextureCoordinate;
varying vec2 bottomTextureCoordinate;
varying vec2 bottomLeftTextureCoordinate;
varying vec2 bottomRightTextureCoordinate;
void main()
{
vec3 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate).rgb;
vec3 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate).rgb;
vec3 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate).rgb;
vec4 centerColor = texture2D(inputImageTexture, textureCoordinate);
vec3 leftColor = texture2D(inputImageTexture, leftTextureCoordinate).rgb;
vec3 rightColor = texture2D(inputImageTexture, rightTextureCoordinate).rgb;
vec3 topColor = texture2D(inputImageTexture, topTextureCoordinate).rgb;
vec3 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate).rgb;
vec3 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate).rgb;
vec3 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2];
resultColor += leftColor * convolutionMatrix[1][0] + centerColor.rgb * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2];
resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2];
gl_FragColor = vec4(resultColor, centerColor.a);
This reads each of the 9 colors and applies the weights from the 3x3 matrix that was supplied for convolution.
That said, a 16x16 convolution is a fairly expensive operation. You're looking at 256 texture reads per pixel. On older devices (iPhone 4 or so), you got around 8 texture reads per pixel for free if they were non-dependent reads. Once you went over that, performance started to drop dramatically. Later GPUs sped this up significantly, though. The iPhone 5S, for example, does well over 40 dependent texture reads per pixel pretty much for free. Even the heaviest shaders on 1080p video barely slow it down.
As sansuiso suggests, if you have a way of separating your kernel into horizontal and vertical passes (like can be done for a Gaussian blur kernel), you can get much better performance due to a dramatic reduction in texture reads. For your 16x16 kernel, you could drop from 256 reads to 32, and even those 32 would be much faster because they would be from passes that only sample 16 texels at a time.
The crossover point for which doing an operation like this is faster in Accelerate on the CPU than in OpenGL ES will vary with the device you're running on. In general, GPUs on the iOS devices have outpaced CPUs in performance growth on each recent generation, so that bar has shifted farther to the GPU side over the last several iOS models.

Related

GPUImage glsl sine wave photoshop effect

I have a requirement to implement an iOS UIImage filter / effect which is a copy of Photoshop's Distort Wave effect. The wave has to have multiple generators and repeat in a tight pattern within a CGRect.
Photos of steps are attached.
I'm having problems creating the glsl code to reproduce the sine wave pattern. I'm also trying to smooth the edge of the effect so that the transition to the area outside the rect is not so abrupt.
I found some WebGL code that produces a water ripple. The waves produced before the center point look close to what I need, but I can't seem to get the math right to remove the water ripple (at center point) and just keep the repeating sine pattern before it:
varying highp vec2 textureCoordinate;
uniform sampler2D inputImageTexture;
uniform highp float time;
uniform highp vec2 center;
uniform highp float angle;
void main() {
highp vec2 cPos = -1.0 + 2.0 * gl_FragCoord.xy / center.xy;
highp float cLength = length(cPos);
highp vec2 uv = gl_FragCoord.xy/center.xy+(cPos/cLength)*cos(cLength*12.0-time*4.0)*0.03;
highp vec3 col = texture2D(inputImageTexture,uv).xyz;
gl_FragColor = vec4(col,1.0);
}
I have to process two Rect areas, one at top and one at the bottom. So being able to process two Rect areas in one pass would be ideal. Plus the edge smoothing.
Thanks in advance for any help.

I've handled this in the past by generating an offset table on the CPU and uploading it as an input texture. So on the CPU, I'd do something like:
for (i = 0; i < tableSize; i++)
{
table [ i ].x = amplitude * sin (i * frequency * 2.0 * M_PI / tableSize + phase);
table [ i ].y = 0.0;
}
You may need to add in more sine waves if you have multiple "generators". Also, note that the above code offsets the x coordinate of each pixel. You could do Y instead, or both, depending on what you need.
Then in the glsl, I'd use that table as an offset for sampling. So it would be something like this:
uniform sampler2DRect table;
uniform sampler2DRect inputImage;
//... rest of your code ...
// Get the offset from the table
vec2 coord = glTexCoord [ 0 ].xy;
vec2 newCoord = coord + texture2DRect (table, coord);
// Sample the input image at the offset coordinate
gl_FragColor = texture2DRect (inputImage, newCoord);

How can I take advantage of lookup tables in my Blinn-Phong lighting shader?

I'm experimenting with some lighting techniques on iOS and I've been able to produce some effects that I'm pleased with by taking advantage of iOS' OpenGL ES extensions for depth lookup textures and a relatively simple Blinn-Phong shader:
The above shows 20 Suzanne monkeys being rendered at full-screen retina with multi-sampling and the following shader. I'm doing multi-sampling because it is only adding 1ms per frame. My current average render time is 30ms total (iPad 3), which is far too slow for 60fps.
Vertex shader:
//Position
uniform mat4 mvpMatrix;
attribute vec4 position;
uniform mat4 depthMVPMatrix;
uniform mat4 vpMatrix;
//Shadow out
varying vec3 ShadowCoord;
//Lighting
attribute vec3 normal;
varying vec3 normalOut;
uniform mat3 normalMatrix;
varying vec3 vertPos;
uniform vec4 lightColor;
uniform vec3 lightPosition;
void main() {
gl_Position = mvpMatrix * position;
//Used for handling shadows
ShadowCoord = (depthMVPMatrix * position).xyz;
ShadowCoord.z -= 0.01;
//Lighting calculations
normalOut = normalize(normalMatrix * normal);
vec4 vertPos4 = vpMatrix * position;
vertPos = vertPos4.xyz / vertPos4.w;
}
Fragment shader:
#extension GL_EXT_shadow_samplers : enable
precision lowp float;
uniform sampler2DShadow shadowTexture;
varying vec3 normalOut;
uniform vec3 lightPosition;
varying vec3 vertPos;
varying vec3 ShadowCoord;
uniform vec4 fillColor;
uniform vec3 specColor;
void main() {
vec3 normal = normalize(normalOut);
vec3 lightDir = normalize(lightPosition - vertPos);
float lambertian = max(dot(lightDir,normal), 0.0);
vec3 reflectDir = reflect(-lightDir, normal);
vec3 viewDir = normalize(-vertPos);
float specAngle = max(dot(reflectDir, viewDir), 0.0);"
float specular = pow(specAngle, 16.0);
gl_FragColor = vec4((lambertian * fillColor.xyz + specular * specColor) * shadow2DEXT(shadowTexture, ShadowCoord), fillColor.w);
}
I've read that it is possible to use textures as lookup tables to reduce computation in the fragment shader, however the linked example seems to be doing full Phong lighting, rather than Blinn-Phong (I'm not doing anything with surface tangents). Furthermore, when running the sample the lighting seemed fairly banded (the background on mine, which is a solid color + Phong shading, looks slightly banded as a result of compression - it looks far smoother on the device). Is it possible to use a lookup texture in my case, or am I going to have to move down to 30fps (which I can just about achieve), turn off multi-sampling and limit Phong shading to the monkeys, rather than the full screen? In a real world (i.e. game) scenario, am I going to need do be doing Phong shading across the entire screen anyway?

what is the best approach to a convolution shader in OpenGL ES 2 that is fast enough for realtime?

NOTE: Right now I'm testing this in the simulator. But the idea is that I get acceptable performance in say, an iPhone 4s. (I know, I should be testing on the device, but I won't have a device for a few days).
I was playing around with making a convolution shader that would allow convolving an image with a filter of support 3x3, 5x5 or 7x7 and the option of multiple passes. The shader itself works I guess. But I notice the following:
A simple box filter 3x3, single-pass, barely blurs the image. So to get a more noticeable blur, I have to do either 3x3 2-pass or 5x5.
The simplest case (the 3x3, 1-pass) is already slow enough that it couldn't be used at say, 30 fps.
I tried two approaches so far (this is for some OGLES2-based plugins I'm doing for iPhone, that's why the methods):
- (NSString *)vertexShader
{
return SHADER_STRING
(
attribute vec4 aPosition;
attribute vec2 aTextureCoordinates0;
varying vec2 vTextureCoordinates0;
void main(void)
{
vTextureCoordinates0 = aTextureCoordinates0;
gl_Position = aPosition;
}
);
}
- (NSString *)fragmentShader
{
return SHADER_STRING
(
precision highp float;
uniform sampler2D uTextureUnit0;
uniform float uKernel[49];
uniform int uKernelSize;
uniform vec2 uTextureUnit0Offset[49];
uniform vec2 uTextureUnit0Step;
varying vec2 vTextureCoordinates0;
void main(void)
{
vec4 outputFragment = texture2D(uTextureUnit0, vTextureCoordinates0 + uTextureUnit0Offset[0] * uTextureUnit0Step) * uKernel[0];
for (int i = 0; i < uKernelSize; i++) {
outputFragment += texture2D(uTextureUnit0, vTextureCoordinates0 + uTextureUnit0Offset[i] * uTextureUnit0Step) * uKernel[i];
}
gl_FragColor = outputFragment;
}
);
}
The idea in this approach is that both the filter values and the offsetCoordinates to fetch texels are precomputed once in Client / App land, and then get set in uniforms. Then, the shader program will always have them available any time it is used. Mind you, the big size of the uniform arrays (49) is because potentially I could do up to a 7x7 kernel.
This approach takes .46s per pass.
Then I tried the following approach:
- (NSString *)vertexShader
{
return SHADER_STRING
(
// Default pass-thru vertex shader:
attribute vec4 aPosition;
attribute vec2 aTextureCoordinates0;
varying highp vec2 vTextureCoordinates0;
void main(void)
{
vTextureCoordinates0 = aTextureCoordinates0;
gl_Position = aPosition;
}
);
}
- (NSString *)fragmentShader
{
return SHADER_STRING
(
precision highp float;
uniform sampler2D uTextureUnit0;
uniform vec2 uTextureUnit0Step;
uniform float uKernel[49];
uniform float uKernelRadius;
varying vec2 vTextureCoordinates0;
void main(void)
{
vec4 outputFragment = vec4(0., 0., 0., 0.);
int kRadius = int(uKernelRadius);
int kSupport = 2 * kRadius + 1;
for (int t = -kRadius; t <= kRadius; t++) {
for (int s = -kRadius; s <= kRadius; s++) {
int kernelIndex = (s + kRadius) + ((t + kRadius) * kSupport);
outputFragment += texture2D(uTextureUnit0, vTextureCoordinates0 + (vec2(s,t) * uTextureUnit0Step)) * uKernel[kernelIndex];
}
}
gl_FragColor = outputFragment;
}
);
}
Here, I still pass the precomputed kernel into the fragment shader via a uniform. But I now compute the texel offsets and even the kernel indices in the shader. I'd expect this approach to be slower since I not only have 2 for loops but I'm also doing a bunch of extra computations for every single fragment.
Interestingly enough, this approach takes .42 secs. Actually faster...
At this point, the only other thing I can think of doing is braking the convolution into 2-passes by thinking of the 2D kernel as two separable 1D kernels. Haven't tried it out yet.
Just for comparison, and aware that the following example is a specific implementation of box filtering that is A - pretty much hardcoded and B - doesn't really adhere to theoretical definition of a classic nxn linear filter (it is not a matrix and doesn't add up to 1), I tried this approach from the OpenGL ES 2.0 Programming guide:
- (NSString *)fragmentShader
{
return SHADER_STRING
(
// Default pass-thru fragment shader:
precision mediump float;
// Input texture:
uniform sampler2D uTextureUnit0;
// Texel step:
uniform vec2 uTextureUnit0Step;
varying vec2 vTextureCoordinates0;
void main() {
vec4 sample0;
vec4 sample1;
vec4 sample2;
vec4 sample3;
float step = uTextureUnit0Step.x;
sample0 = texture2D(uTextureUnit0, vec2(vTextureCoordinates0.x - step, vTextureCoordinates0.y - step));
sample1 = texture2D(uTextureUnit0, vec2(vTextureCoordinates0.x + step, vTextureCoordinates0.y + step));
sample2 = texture2D(uTextureUnit0, vec2(vTextureCoordinates0.x + step, vTextureCoordinates0.y - step));
sample3 = texture2D(uTextureUnit0, vec2(vTextureCoordinates0.x - step, vTextureCoordinates0.y + step));
gl_FragColor = (sample0 + sample1 + sample2 + sample3) / 4.0;
}
);
}
This approach takes 0.06s per pass.
Mind you, the above is my adaptation where I made the step pretty much the same texel offset I was using in my implementation. With this step, the result is very similar to my implementation, but the original shader in the OpenGL guide uses a larger step which blurs more.
So with all the above being said, my questions is really two-fold:
I'm computing the step / texel offset as vec2(1 / image width, 1 / image height). With this offset, like I said, a 3x3 box filter is barely noticeable. Is this correct? or am I misunderstanding the computation of the step or something else?
Is there anything else I could do to try and get the "convolution in the general case" approach to run fast enough for real-time? Or do I necessarily need to go for a simplification like the OpenGL example?

If you run those through the OpenGL ES Analysis tool in Instruments or the Frame Debugger in Xcode, you'll probably see a note about dependent texture reads -- you're calculating texcoords in the fragment shader, which means the hardware can't fetch texel data until it gets to that point in evaluating the shader. If texel coordinates are known going into the fragment shader, the hardware can prefetch your texel data in parallel with other tasks, so it's ready to go by the time the fragment shader needs it.
You can speed things up greatly by precomputing texel coordinates in the vertex shader. Brad Larson has a good example of doing such in this answer to a similar question.

I don't have answers regarding your precise questions, but you should take a look at GPUImage framework - which implements several box blur filter (see this SO question) - among which a 2-pass 9x9 filter - you can also see this article for real-time FPS of different approaches : vImage VS GPUImage vs CoreImage

Opengles fragment shader achieve the effect

I want to achieve a smooth merge effect of the image on center cut. The centre cut i achieved from the below code.
varying highp vec2 textureCoordinate;
uniform sampler2D videoFrame;
void main(){
vec4 CurrentColor = vec4(0.0);
if(textureCoordinate.y < 0.5){
CurrentColor = texture2D(videoFrame,vec2(textureCoordinate.x,(textureCoordinate.y-0.125)));
} else{
CurrentColor = texture2D(videoFrame,vec2(textureCoordinate.x,(textureCoordinate.y+0.125)));
}
gl_fragColor = CurrentColor;
}
The above code gives the effect to below image.
Actual:
Centre cut:
Desired Output:
What i want is the sharp cut should not be there, there should be smooth gradient merge of both halves.

Do you want an actual blur there, or just linear blend? Because blurring involves a blurring kernel, whereas a blend would be simple interpolation between those two, depending on the y-coordinate.
This is the code for a linear blend.
varying highp vec2 textureCoordinate;
uniform sampler2D videoFrame;
void main(){
float steepness = 20; /* controls the width of the blending zone, larger values => shaper gradient */
vec4 a = texture2D(videoFrame,vec2(textureCoordinate.x,(textureCoordinate.y-0.125)));
vec4 b = texture2D(videoFrame,vec2(textureCoordinate.x,(textureCoordinate.y+0.125)));
/* EDIT: Added a clamp to the smoothstep parameter -- should not be neccessary though */
vec4 final = smoothstep(a, b, clamp((y-0.5)*steepness, 0., 1.)); /* there's also mix instead of smoothstep, try both */
gl_FragColor = final;
}
Doing an actual blur is a bit more complicated, as you've to apply that blurring kernel. Basically it involves two nested loops, iterating over the neighbouring texels and summing them up according to some distribution (most flexible by supplying that distribution through an additional texture which also allowed to add some bokeh).

iPad GLSL. From within a fragment shader how do I get the surface - not vertex - normal

Is it possible to access the surface normal - the normal associated with the plane of a fragment - from within a fragment shader? Or perhaps this can be done in the vertex shader?
Is all knowledge of the associated geometry lost when we go down the shader pipeline or is there some clever way of recovering that information in either the vertex of fragment shader?
Thanks in advance.
Cheers,
Doug
twitter: #dugla

The surface normal vector can be calculated approximately by the partial derivative of the view space position in the frgament shader. The partial derivative can be get by the functions dFdx and dFdy. For this is required OpenGL es 3.0 or the OES_standard_derivatives extension:
in vec3 view_position;
void main()
{
vec3 normalvector = cross(dFdx(view_position), dFdy(view_position));
nv = normalize(normalvector * sign(normalvector.z));
.....
}
In general it is possible to calculate the normal vector of a surface in a geometry shader (since OpenGL ES 3.2).
For example if you draw triangles you get all three points in the geometry shader.
Three points define a plane from which the normal vector can be calculated.
You just have to be careful if the points are arranged clockwise or counterclockwise.
The normal vector of a triangle is the normalized cross product of 2 vectors defined
by the corner points of the triangle.
See the folowing example which for counterclockwise triangles:
Vertex shader
#version 400
layout (location = 0) in vec3 inPos;
out vec3 vertPos;
uniform mat4 u_projectionMat44;
uniform mat4 u_modelViewMat44;
void main()
{
vec4 viewPos = u_modelViewMat44 * vec4( inPos, 1.0 );
vertPos = viewPos.xyz;
gl_Position = u_projectionMat44 * viewPos;
}
Geometry shader
#version 400
layout( triangles ) in;
layout( triangle_strip, max_vertices = 3 ) out;
in vec3 vertPos[];
out vec3 geoPos;
out vec3 geoNV;
void main()
{
vec3 leg1 = vertPos[1] - vertPos[0];
vec3 leg2 = vertPos[2] - vertPos[0];
geoNV = normalize( cross( leg1, leg2 ) );
geoPos = vertPos[0];
EmitVertex();
geoPos = vertPos[1];
EmitVertex();
geoPos = vertPos[2];
EmitVertex();
EndPrimitive();
}
Fragment shader
#version 400
in vec3 geoPos;
in vec3 geoNV;
void main()
{
// ...
}
Of course you can calculate the normalvector also in the tesselation shaders (since OpenGL ES 3.2).
But this makes sense only if you already required tessellation shader for other reasons and additionally calculate
the normal vector of the face:
Vertex shader
The vertex shader is the same as above.
Tessellation control shader
#version 400
layout( vertices=3 ) out;
in vec3 vertPos[];
out vec3 tctrlPos[];
void main()
{
tctrlPos[gl_InvocationID] = vertPos[gl_InvocationID];
if ( gl_InvocationID == 0 )
{
gl_TessLevelOuter[0] = ;
gl_TessLevelOuter[1] = ;
gl_TessLevelOuter[2] = ;
gl_TessLevelInner[0] = ;
}
}
Tessellation evaluation shader
#version 400
layout(triangles, ccw) in;
in vec3 tctrlPos[];
out vec3 tevalPos;
out vec3 tevalNV;
void main()
{
vec3 leg1 = tctrlPos[1] - tctrlPos[0];
vec3 leg2 = tctrlPos[2] - tctrlPos[0];
tevalNV = normalize( cross( leg1, leg2 ) );
tevalPos = tctrlPos[0] * gl_TessCoord.x + tctrlPos[1] * gl_TessCoord.y + tctrlPos[2] * gl_TessCoord.z;
}
Fragmant shader
#version 400
in vec3 tevalPos;
in vec3 tevalNV;
void main()
{
// ...
}

You can get per-pixel normals interpolated from vertex normales by just using a "varying" (in newer OpenGL it is just in/out) variable. But do not forget to normalize this normal! Interpolated normals must not have a length of 1 any longer. These normals also give bad results on sharp edges.
If you want to use custom normals with a higher resolution a commonly used technique are normal maps. You create a texture with baked normals for your object. Then you can access the normal in the fragment texture using a textur look-up.

If you pass the vertex normal through to the fragment shader in a "varying" then you will get an interpolated fragment normal.
EDIT: You will have to calculate the normals in your application, and pass them into your shader as an attribute for each vertex of your triangle.
The usual way to calculate the normal for a triangle is with a cross product.
Call the three points making up the triangle P1, P2, and P3.
Calculate V1, the vector from P1 to P2.
Calculate V2, the vector from P1 to P3.
Calculate the cross product of V1 and V2.
This will give you the normal to the plane of the triangle. V2 should be "to the left of" V1, or your normal will point "in" instead of "out". See the Wikipedia article on cross products for details.
FURTHER EDIT: Right, I understand your problem now. Yes, it's true that with shared vertices you can't really have more than one normal per vertex.
The only other thing that I can think of is that maybe a geometry shader could help, because it gets passed all three vertices for a triangle. I don't have any experience with them though.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Faster convolution on ios - ios

Related

GPUImage glsl sine wave photoshop effect

How can I take advantage of lookup tables in my Blinn-Phong lighting shader?

what is the best approach to a convolution shader in OpenGL ES 2 that is fast enough for realtime?

Opengles fragment shader achieve the effect

iPad GLSL. From within a fragment shader how do I get the surface - not vertex - normal

Categories

Resources