Stucky dithering as custom CIKernel does not work - ios

I'm trying to implement Stucky dithering (error diffusion) as a CIKernel but I am a bit lost. I don't find a way to debug the filter (I m new to CIKernel).
Here is what I have came up so far, but the kernel does not compile, plus, I wonder how I could store the spreaded error to the destination pixels.
Help welcome. Is there a tool/way to trace or debug the code?
float mDiffCoef = [
0, 0 0, 8/42, 4/42,
2/42, 4/42, 8/42, 4/42, 2/42,
1/42, 2/42, 4/42, 2/42, 1/42
];
kernel vec4 dither(__sample image) {
vec2 coord = samplerCoord(image);
vec4 pixel = sample(image, coord);
// decompress pixel into rgb
float red = pixel.r;
float color = 0;
if (red >= 127) color = 255;
float err = red - color;
// Spread the error according to the matrix
for (int x = -2; x <= 2; x++) {
for (int y = 0; y <= 3; y++) {
vec2 workingSpaceCoordinate = destCoord() + vec2(x,y);
vec2 imageSpaceCoordinate = samplerTransform(image, workingSpaceCoordinate);
vec3 color = sample(image, imageSpaceCoordinate).rgb;
color += err * mDiffCoef[(2+x)+5*y];
}
}
return vec4(color, 1.0);
}
[Updated code to remove syntax errors]:
kernel vec4 dither(__sample image) {
float mDiffCoef[3 * 5];
int i = 0;
mDiffCoef[i++] = 0.;
mDiffCoef[i++] = 0.;
mDiffCoef[i++] = 0.;
mDiffCoef[i++] = 8./42.;
mDiffCoef[i++] = 4./42.;
mDiffCoef[i++] = 2./42.;
mDiffCoef[i++] = 4./42.;
mDiffCoef[i++] = 8./42.;
mDiffCoef[i++] = 4./42.;
mDiffCoef[i++] = 2./42.;
mDiffCoef[i++] = 1./42.;
mDiffCoef[i++] = 2./42.;
mDiffCoef[i++] = 4./42.;
mDiffCoef[i++] = 2./42.;
mDiffCoef[i++] = 1./42.;
vec2 coord = samplerCoord(image);
vec4 pixel = sample(image, coord);
// decompress pixel into rgb
float red = pixel.r;
float c = 0.;
if (red >= 127.) c = 255.;
float err = red - c;
float color = 0.;
// Spread the error according to the matrix
for (int x = -2; x <= 2; x++) {
for (int y = 0; y < 3; y++) {
vec2 workingSpaceCoordinate = destCoord() + vec2(x,y);
vec2 imageSpaceCoordinate = samplerTransform(image, workingSpaceCoordinate);
float c = sample(image, imageSpaceCoordinate).r; //gb;
color = c + err * mDiffCoef[(2+x)+5*y];
}
}
color = clamp(color, 0., 255.);
return vec4(255., color, color, 1.);
My main issue is how can I write the outpu pixel buffers by cumulating portion of value from the error diffusion?

Related

Convert Shader to CIKernel

I'm trying to convert this particular shader to CIKernel Code.
https://www.shadertoy.com/view/4scBRH
I've got this soo far,
kernel vec4 thresholdFilter(__sample image, float time)
{
vec2 uv = destCoord();
float amount = sin(time) * 0.1;
amount *= 0.3;
float split = 1. - fract(time / 2.0);
float scanOffset = 0.01;
vec2 uv1 = vec2(uv.x + amount, uv.y);
vec2 uv2 = vec2(uv.x, uv.y + amount);
if (uv.y > split) {
uv.x += scanOffset;
uv1.x += scanOffset;
uv2.x += scanOffset;
}
float r = sample(image, uv1).r;
float g = sample(image, uv).g;
float b = sample(image, uv2).b;
float a = 1.0;
vec3 outPutPixel = sample(image, samplerTransform(image, uv)).rgb;
return vec4(outPutPixel, 1.0);
}
The output of this code is not even close to the shaderToy output.

HLSL alphablending in geometry shader

I am rather new to HLSL and I am struggling with implementing a grass shader.
In the geometry shader I create quads which will display the grass blades. However when I try blending in the pixelshader things get weird. Sometimes it ignores everything which is behind the quad. I'm assuming it's a problem with the depth stencil.
this is the result:
Here is my shader:
//************
// VARIABLES *
//************
cbuffer cbPerObject
{
float4x4 m_MatrixWorldViewProj : WORLDVIEWPROJECTION;
float4x4 m_MatrixWorld : WORLD;
float4x4 gMatrixViewInverse : VIEWINVERSE;
float3 m_LightDir = { 2.0f,-5.0f,0.0f };
}
RasterizerState FrontCulling
{
CullMode = NONE;
};
SamplerState samLinear
{
Filter = MIN_MAG_MIP_LINEAR;
AddressU = Wrap;// of Mirror of Clamp of Border
AddressV = Wrap;// of Mirror of Clamp of Border
};
BlendState EnableBlending
{
BlendEnable[0] = TRUE;
SrcBlend = SRC_ALPHA;
DestBlend = INV_SRC_ALPHA;
BlendOp = ADD;
SrcBlendAlpha = ZERO;
DestBlendAlpha = ZERO;
BlendOpAlpha = ADD;
RenderTargetWriteMask[0] = 0x0F;
};
DepthStencilState EnableDepth
{
// Depth test parameters
DepthEnable = true;
DepthWriteMask = all;
DepthFunc = less;
StencilEnable = false;
};
Texture2D m_TextureDiffuse<
string UIName = "Diffuse Texture";
string UIWidget = "Texture";
string ResourceName = "Grass.dds";
>;
Texture2D m_TextureDiffuseBlade<
string UIName = "Diffuse Texture Blade";
string UIWidget = "Texture";
string ResourceName = "GrassBladeDiffuse.dds";
>;
Texture2D m_PerlinNoise<
string UIName = "Perlin Texture";
string UIWidget = "Texture";
string ResourceName = "Perlin.dds";
>;
float gGrassHeight
<
string UIName = "Grass Height";
string UIWidget = "slider";
float UIMin = 0;
float UIMax = 10.0f;
float UIStep = 0.01;
> = 0.6f;
float gGrassHeightRandom
<
string UIName = "Grass Height Random";
string UIWidget = "slider";
float UIMin = 0;
float UIMax = 1.0f;
float UIStep = 0.01;
> = 1.0f;
float gGrassBend
<
string UIName = "Grass Bend";
string UIWidget = "slider";
float UIMin = 0;
float UIMax = 1.0f;
float UIStep = 0.01;
> = 1.0f;
int gGrassBlades
<
string UIName = "Grass Blades";
string UIWidget = "slider";
int UIMin = 1;
int UIMax = 5.0f;
int UIStep = 1;
> = 5;
float gGrassBladesSize
<
string UIName = "Grass Blades Size";
string UIWidget = "slider";
float UIMin = 0;
float UIMax = 1.0f;
float UIStep = 0.01;
> = 0.2f;
float gGrassSpread<
string UIName = "Grass Spread";
> = 5.0f;
float gTime;
//**********
// STRUCTS *
//**********
struct VS_DATA
{
float3 Position : POSITION;
float3 Normal : NORMAL;
float2 TexCoord : TEXCOORD;
};
struct GS_DATA
{
float4 Position : SV_POSITION;
float3 Normal : NORMAL;
float2 TexCoord : TEXCOORD0;
bool Blade : FALSE;
};
//****************
// VERTEX SHADER *
//****************
VS_DATA MainVS(VS_DATA vsData)
{
return vsData;
}
//******************
// GEOMETRY SHADER *
//******************
void CreateVertex(inout TriangleStream<GS_DATA> triStream, float3 pos, float3 normal, float2 texCoord, bool blade = true)
{
//Step 1. Create a GS_DATA object
GS_DATA temp = (GS_DATA)0;
//Step 2. Transform the position using the WVP Matrix and assign it to (GS_DATA object).Position (Keep in mind: float3 -> float4)
temp.Position = mul(float4(pos, 1), m_MatrixWorldViewProj);
//Step 3. Transform the normal using the World Matrix and assign it to (GS_DATA object).Normal (Only Rotation, No translation!)
temp.Normal = mul(normal, (float3x3)m_MatrixWorld);
//Step 4. Assign texCoord to (GS_DATA object).TexCoord
temp.TexCoord = texCoord;
//set if blade or not
temp.Blade = blade;
//Step 5. Append (GS_DATA object) to the TriangleStream parameter (TriangleStream::Append(...))
triStream.Append(temp);
}
float3x3 AngleAxis3x3(float angle, float3 axis)
{
float c, s;
sincos(angle, s, c);
float t = 1 - c;
float x = axis.x;
float y = axis.y;
float z = axis.z;
return float3x3(
t * x * x + c, t * x * y - s * z, t * x * z + s * y,
t * x * y + s * z, t * y * y + c, t * y * z - s * x,
t * x * z - s * y, t * y * z + s * x, t * z * z + c
);
}
[maxvertexcount(5*6*3 +3)]
//[instance(16)]
void GrassGenerator(triangle VS_DATA vertices[3], inout TriangleStream<GS_DATA> triStream)//, uint InstanceID : SV_GSInstanceID)
{
float3 basePoint, top;
//Step 1. Calculate The basePoint
basePoint = (vertices[0].Position + vertices[1].Position + vertices[2].Position) / 3;
//Step 2. Calculate The normal of the basePoint
float3 normal = normalize((vertices[0].Normal + vertices[1].Normal + vertices[2].Normal) / 3);
//orignal vertex
CreateVertex(triStream, vertices[0].Position, vertices[0].Normal, vertices[0].TexCoord, false);
CreateVertex(triStream, vertices[1].Position, vertices[1].Normal, vertices[1].TexCoord, false);
CreateVertex(triStream, vertices[2].Position, vertices[2].Normal, vertices[2].TexCoord, false);
triStream.RestartStrip();
float3 left, right, grassnormal;
for (int j = 0; j < gGrassBlades; j++)
{
float3 position = basePoint + float3(m_PerlinNoise.SampleLevel(samLinear, vertices[j].TexCoord, 0).y - 0.5f, m_PerlinNoise.SampleLevel(samLinear, vertices[j].TexCoord, 0).z - 0.5f, 0)*gGrassSpread;
top = position + (gGrassHeight * normal);
float3 grassDirection = float3(1, 0, 0) * gGrassBladesSize;
float xAngle = 0.0f;
for (int i = 0; i < 3; i++)
{
float3x3 rotation = AngleAxis3x3(xAngle, normal);
grassDirection = mul(grassDirection, rotation);
//Step 5. Calculate The Normal of the grass
float3 leftEdge, rightEdge;
leftEdge = (position - grassDirection) - top;
rightEdge = (position + grassDirection) - top;
grassnormal = normalize(cross(leftEdge, rightEdge));
//Create Spike Geometry
CreateVertex(triStream, top - grassDirection, grassnormal, float2(0, 0));
CreateVertex(triStream, position - grassDirection, grassnormal, float2(0, 1));
CreateVertex(triStream, position + grassDirection, grassnormal, float2(1, 1));
triStream.RestartStrip();
CreateVertex(triStream, top + grassDirection, grassnormal, float2(1, 0));
CreateVertex(triStream, position + grassDirection, grassnormal, float2(1, 1));
CreateVertex(triStream, top - grassDirection, grassnormal, float2(0, 0));
triStream.RestartStrip();
static const float PI = 3.14159265f;
xAngle = 2 * PI / 3;
}
}
}
//***************
// PIXEL SHADER *
//***************
float4 MainPS(GS_DATA input) : SV_TARGET
{
input.Normal = -normalize(input.Normal);
float alpha;
float3 color;
if (input.Blade) {
alpha = m_TextureDiffuseBlade.Sample(samLinear,input.TexCoord).a;
color = m_TextureDiffuseBlade.Sample(samLinear,input.TexCoord).rgb;
}
else {
alpha = m_TextureDiffuse.Sample(samLinear,input.TexCoord).a;
color = m_TextureDiffuse.Sample(samLinear,input.TexCoord).rgb;
}
float s = max(dot(m_LightDir, input.Normal), 0.4f);
return float4(color*s,alpha);
}
//*************
// TECHNIQUES *
//*************
technique10 DefaultTechnique
{
pass p0 {
SetDepthStencilState(EnableDepth, 0);
SetBlendState(EnableBlending, float4(0.0f, 0.0f, 0.0f, 0.0f), 0xFFFFFFFF);
SetRasterizerState(FrontCulling);
SetVertexShader(CompileShader(vs_4_0, MainVS()));
SetGeometryShader(CompileShader(gs_5_0, GrassGenerator()));
SetPixelShader(CompileShader(ps_4_0, MainPS()));
}
}

how do I port this Shadertoy shader with an fwidth() call to a Metal shader?

I've been porting Shadertoy shaders to Metal in order to learn how to write Metal shaders. I don't think I'm doing it correctly as I have been writing every one of my shaders as a compute shader, rather than vertex/fragment shaders. This has worked for quite a few shaders I've ported, almost 20. However some ports are extremely slow, and others include functions that aren't available.
Here is one of the shaders that is tripping me up:
https://www.shadertoy.com/view/4t2SRh
The fwidth() call in render() and mainImage() is not allowed within a metal compute shader. Metal Shader Language does however have fwidth(), but it can only be called within a fragment shader.
Here is my attempt at porting to a compute shader:
#include <metal_stdlib>
using namespace metal;
float float_mod(float f1, float f2) {
return f1-f2 * floor(f1/f2);
}
float sdfCircle(float2 center, float radius, float2 coord )
{
float2 offset = coord - center;
return sqrt((offset.x * offset.x) + (offset.y * offset.y)) - radius;
}
float sdfEllipse(float2 center, float a, float b, float2 coord)
{
float a2 = a * a;
float b2 = b * b;
return (b2 * (coord.x - center.x) * (coord.x - center.x) +
a2 * (coord.y - center.y) * (coord.y - center.y) - a2 * b2)/(a2 * b2);
}
float sdfLine(float2 p0, float2 p1, float width, float2 coord)
{
float2 dir0 = p1 - p0;
float2 dir1 = coord - p0;
float h = clamp(dot(dir0, dir1)/dot(dir0, dir0), 0.0, 1.0);
return (length(dir1 - dir0 * h) - width * 0.5);
}
float sdfUnion( const float a, const float b )
{
return min(a, b);
}
float sdfDifference( const float a, const float b)
{
return max(a, -b);
}
float sdfIntersection( const float a, const float b )
{
return max(a, b);
}
float anti(float d) {
return fwidth(d) * 1.0;
}
float4 render(float d, float3 color, float stroke)
{
//stroke = fwidth(d) * 2.0;
float anti = fwidth(d) * 1.0;
float4 strokeLayer = float4(float3(0.05), 1.0-smoothstep(-anti, anti, d - stroke));
float4 colorLayer = float4(color, 1.0-smoothstep(-anti, anti, d));
if (stroke < 0.000001) {
return colorLayer;
}
return float4(mix(strokeLayer.rgb, colorLayer.rgb, colorLayer.a), strokeLayer.a);
}
kernel void compute(texture2d<float, access::write> output [[texture(0)]],
texture2d<float, access::sample> input [[texture(1)]],
constant float &timer [[buffer(0)]],
uint2 gid [[thread_position_in_grid]])
{
float4 fragColor;
int width = output.get_width();
int height = output.get_height();
float2 resolution = float2(width,height);
float2 uv = float2(gid) / resolution;
float size = min(resolution.x, resolution.y);
float pixSize = 1.0 / size;
float stroke = pixSize * 1.5;
float2 center = float2(0.5, 0.5 * resolution.y/resolution.x);
float a = sdfEllipse(float2(0.5, center.y*2.0-0.34), 0.25, 0.25, uv);
float b = sdfEllipse(float2(0.5, center.y*2.0+0.03), 0.8, 0.35, uv);
b = sdfIntersection(a, b);
float4 layer1 = render(b, float3(0.32, 0.56, 0.53), fwidth(b) * 2.0);
// Draw strips
float4 layer2 = layer1;
float t, r0, r1, r2, e, f;
float2 sinuv = float2(uv.x, (sin(uv.x*40.0)*0.02 + 1.0)*uv.y);
for (float i = 0.0; i < 10.0; i++) {
t = float_mod(timer + 0.3 * i, 3.0) * 0.2;
r0 = (t - 0.15) / 0.2 * 0.9 + 0.1;
r1 = (t - 0.15) / 0.2 * 0.1 + 0.9;
r2 = (t - 0.15) / 0.2 * 0.15 + 0.85;
e = sdfEllipse(float2(0.5, center.y*2.0+0.37-t*r2), 0.7*r0, 0.35*r1, sinuv);
f = sdfEllipse(float2(0.5, center.y*2.0+0.41-t), 0.7*r0, 0.35*r1, sinuv);
f = sdfDifference(e, f);
f = sdfIntersection(f, b);
float4 layer = render(f, float3(1.0, 0.81, 0.27), 0.0);
layer2 = mix(layer2, layer, layer.a);
}
// Draw the handle
float bottom = 0.08;
float handleWidth = 0.01;
float handleRadius = 0.04;
float d = sdfCircle(float2(0.5-handleRadius+0.5*handleWidth, bottom), handleRadius, uv);
float c = sdfCircle(float2(0.5-handleRadius+0.5*handleWidth, bottom), handleRadius-handleWidth, uv);
d = sdfDifference(d, c);
c = uv.y - bottom;
d = sdfIntersection(d, c);
c = sdfLine(float2(0.5, center.y*2.0-0.05), float2(0.5, bottom), handleWidth, uv);
d = sdfUnion(d, c);
c = sdfCircle(float2(0.5, center.y*2.0-0.05), 0.01, uv);
d = sdfUnion(c, d);
c = sdfCircle(float2(0.5-handleRadius*2.0+handleWidth, bottom), handleWidth*0.5, uv);
d = sdfUnion(c, d);
float4 layer0 = render(d, float3(0.404, 0.298, 0.278), stroke);
float2 p = (2.0*float2(gid).xy-resolution.xy)/min(resolution.y,resolution.x);
float3 bcol = float3(1.0,0.8,0.7-0.07*p.y)*(1.0-0.25*length(p));
fragColor = float4(bcol, 1.0);
fragColor.rgb = mix(fragColor.rgb, layer0.rgb, layer0.a);
fragColor.rgb = mix(fragColor.rgb, layer1.rgb, layer1.a);
fragColor.rgb = mix(fragColor.rgb, layer2.rgb, layer2.a);
fragColor.rgb = pow(fragColor.rgb, float3(1.0/2.2));
output.write(fragColor,gid);
}
This doesn't compile, as fwidth() is not available. However, if I do get rid of fwidth(), it will compile... but of course not draw the right thing.
I was wondering if there is a better way to port this to a fragment/vertex shader, so that I can use MSL's fwidth() ? Or is writing it as a compute shader fine, and I should find a different way around using fwidth() ?

DX 11 Compute Shader\SharpDX Deferrerd Tiled lighting, Point light problems

I have just finished porting my engine from XNA to SharpDX(DX11).
Everything is going really well and I have conquered most of my issues without having to ask for help until now and I'm really stuck, maybe I just need another set of eye to look over my code idk but here it is.
I'm implementing tile based lighting (point lights only for now), I'm basing my code off the Intel sample because it's not as messy as the ATI one.
So my problem is that the lights move with the camera, I have looked all over the place to find a fix and I have tried everything (am I crazy?).
I just made sure all my normal and light vectors are in view space and normalized (still the same).
I have tried with the inverse View, inverse Projection, a mix of the both and a few other bits from over the net but I can't fix it.
So here is my CPU code:
Dim viewSpaceLPos As Vector3 = Vector3.Transform(New Vector3(pointlight.PosRad.X, pointlight.PosRad.Y, pointlight.PosRad.Z), Engine.Camera.EyeTransform)
Dim lightMatrix As Matrix = Matrix.Scaling(pointlight.PosRad.W) * Matrix.Translation(New Vector3(pointlight.PosRad.X, pointlight.PosRad.Y, pointlight.PosRad.Z))
Here is my CS shader code:
[numthreads(GROUP_WIDTH, GROUP_HEIGHT, GROUP_DEPTH)]
void TileLightingCS(uint3 dispatchThreadID : SV_DispatchThreadID, uint3 GroupID : SV_GroupID, uint3 GroupThreadID : SV_GroupThreadID)
{
int2 globalCoords = dispatchThreadID.xy;
uint groupIndex = GroupThreadID.y * GROUP_WIDTH + GroupThreadID.x;
float minZSample = FrameBufferCamNearFar.x;
float maxZSample = FrameBufferCamNearFar.y;
float2 gbufferDim;
DepthBuffer.GetDimensions(gbufferDim.x, gbufferDim.y);
float2 screenPixelOffset = float2(2.0f, -2.0f) / gbufferDim;
float2 positionScreen = (float2(globalCoords)+0.5f) * screenPixelOffset.xy + float2(-1.0f, 1.0f);
float depthValue = DepthBuffer[globalCoords].r;
float3 positionView = ComputePositionViewFromZ(positionScreen, Projection._43 / (depthValue - Projection._33));
// Avoid shading skybox/background or otherwise invalid pixels
float viewSpaceZ = positionView.z;
bool validPixel = viewSpaceZ >= FrameBufferCamNearFar.x && viewSpaceZ < FrameBufferCamNearFar.y;
[flatten] if (validPixel)
{
minZSample = min(minZSample, viewSpaceZ);
maxZSample = max(maxZSample, viewSpaceZ);
}
// How many total lights?
uint totalLights, dummy;
InputBuffer.GetDimensions(totalLights, dummy);
// Initialize shared memory light list and Z bounds
if (groupIndex == 0)
{
sTileNumLights = 0;
sMinZ = 0x7F7FFFFF; // Max float
sMaxZ = 0;
}
GroupMemoryBarrierWithGroupSync();
if (maxZSample >= minZSample) {
InterlockedMin(sMinZ, asuint(minZSample));
InterlockedMax(sMaxZ, asuint(maxZSample));
}
GroupMemoryBarrierWithGroupSync();
float minTileZ = asfloat(sMinZ);
float maxTileZ = asfloat(sMaxZ);
// Work out scale/bias from [0, 1]
float2 tileScale = float2(FrameBufferCamNearFar.zw) * rcp(float(2 * GROUP_WIDTH));
float2 tileBias = tileScale - float2(GroupID.xy);
// Now work out composite projection matrix
// Relevant matrix columns for this tile frusta
float4 c1 = float4(Projection._11 * tileScale.x, 0.0f, tileBias.x, 0.0f);
float4 c2 = float4(0.0f, -Projection._22 * tileScale.y, tileBias.y, 0.0f);
float4 c4 = float4(0.0f, 0.0f, 1.0f, 0.0f);
// Derive frustum planes
float4 frustumPlanes[6];
// Sides
frustumPlanes[0] = c4 - c1;
frustumPlanes[1] = c4 + c1;
frustumPlanes[2] = c4 - c2;
frustumPlanes[3] = c4 + c2;
// Near/far
frustumPlanes[4] = float4(0.0f, 0.0f, 1.0f, -minTileZ);
frustumPlanes[5] = float4(0.0f, 0.0f, -1.0f, maxTileZ);
// Normalize frustum planes (near/far already normalized)
[unroll] for (uint i = 0; i < 4; ++i)
{
frustumPlanes[i] *= rcp(length(frustumPlanes[i].xyz));
}
// Cull lights for this tile
for (uint lightIndex = groupIndex; lightIndex < totalLights; lightIndex += (GROUP_WIDTH * GROUP_HEIGHT))
{
PointLight light = InputBuffer[lightIndex];
float3 lightVS = light.PosRad.xyz;// mul(float4(light.Pos.xyz, 1), View);
// Cull: point light sphere vs tile frustum
bool inFrustum = true;
[unroll]
for (uint i = 0; i < 6; ++i)
{
float d = dot(frustumPlanes[i], float4(lightVS, 1.0f));
inFrustum = inFrustum && (d >= -light.PosRad.w);
}
[branch]
if (inFrustum)
{
uint listIndex;
InterlockedAdd(sTileNumLights, 1, listIndex);
sTileLightIndices[listIndex] = lightIndex;
}
}
GroupMemoryBarrierWithGroupSync();
uint numLights = sTileNumLights;
if (all(globalCoords < FrameBufferCamNearFar.zw))
{
float4 NormalMap = NormalBuffer[globalCoords];
float3 normal = DecodeNormal(NormalMap);
if (numLights > 0)
{
float3 lit = float3(0.0f, 0.0f, 0.0f);
for (uint tileLightIndex = 0; tileLightIndex < numLights; ++tileLightIndex)
{
PointLight light = InputBuffer[sTileLightIndices[tileLightIndex]];
float3 lDir = light.PosRad.xyz - positionView;
lDir = normalize(lDir);
float3 nl = saturate(dot(lDir, normal));
lit += ((light.Color.xyz * light.Color.a) * nl) * 0.1f;
}
PointLightColor[globalCoords] = float4(lit, 1);
}
else
{
PointLightColor[globalCoords] = 0;
}
}
GroupMemoryBarrierWithGroupSync();
}
So I know the culling works because there are lights drawn, they just move with the camera.
Could it be a handedness issue?
Am I setting my CPU light code up right?
Have I messed my spaces up?
What am I missing?
Am I reconstructing my position from depth wrong? (don't think it's this because the culling works)
ps. I write depth out like this:
VS shader
float4 viewSpacePos = mul(float4(input.Position,1), WV);
output.Depth=viewSpacePos.z ;
PS Shader
-input.Depth.x / FarClip

First two fragment shader outputs are different

I'm currently trying to get this bokeh shader to work with GPUImage: http://blenderartists.org/forum/showthread.php?237488-GLSL-depth-of-field-with-bokeh-v2-4-(update)
This is what I've got at the moment:
precision mediump float;
varying highp vec2 textureCoordinate;
varying highp vec2 textureCoordinate2;
uniform sampler2D inputImageTexture;
uniform sampler2D inputImageTexture2;
uniform float inputImageTextureWidth;
uniform float inputImageTextureHeight;
#define PI 3.14159265
float width = inputImageTextureWidth; //texture width
float height = inputImageTextureHeight; //texture height
vec2 texel = vec2(1.0/width,1.0/height);
//uniform variables from external script
uniform float focalDepth; //focal distance value in meters, but you may use autofocus option below
uniform float focalLength; //focal length in mm
uniform float fstop; //f-stop value
bool showFocus = false; //show debug focus point and focal range (red = focal point, green = focal range)
float znear = 0.1; //camera clipping start
float zfar = 5.0; //camera clipping end
//------------------------------------------
//user variables
int samples = 3; //samples on the first ring
int rings = 3; //ring count
bool manualdof = false; //manual dof calculation
float ndofstart = 1.0; //near dof blur start
float ndofdist = 2.0; //near dof blur falloff distance
float fdofstart = 1.0; //far dof blur start
float fdofdist = 3.0; //far dof blur falloff distance
float CoC = 0.03;//circle of confusion size in mm (35mm film = 0.03mm)
bool vignetting = false; //use optical lens vignetting?
float vignout = 1.3; //vignetting outer border
float vignin = 0.0; //vignetting inner border
float vignfade = 22.0; //f-stops till vignete fades
bool autofocus = false; //use autofocus in shader? disable if you use external focalDepth value
vec2 focus = vec2(0.5, 0.5); // autofocus point on screen (0.0,0.0 - left lower corner, 1.0,1.0 - upper right)
float maxblur = 1.0; //clamp value of max blur (0.0 = no blur,1.0 default)
float threshold = 0.5; //highlight threshold;
float gain = 2.0; //highlight gain;
float bias = 0.5; //bokeh edge bias
float fringe = 0.7; //bokeh chromatic aberration/fringing
bool noise = false; //use noise instead of pattern for sample dithering
float namount = 0.0001; //dither amount
bool depthblur = false; //blur the depth buffer?
float dbsize = 1.25; //depthblursize
/*
next part is experimental
not looking good with small sample and ring count
looks okay starting from samples = 4, rings = 4
*/
bool pentagon = false; //use pentagon as bokeh shape?
float feather = 0.4; //pentagon shape feather
//------------------------------------------
float penta(vec2 coords) //pentagonal shape
{
float scale = float(rings) - 1.3;
vec4 HS0 = vec4( 1.0, 0.0, 0.0, 1.0);
vec4 HS1 = vec4( 0.309016994, 0.951056516, 0.0, 1.0);
vec4 HS2 = vec4(-0.809016994, 0.587785252, 0.0, 1.0);
vec4 HS3 = vec4(-0.809016994,-0.587785252, 0.0, 1.0);
vec4 HS4 = vec4( 0.309016994,-0.951056516, 0.0, 1.0);
vec4 HS5 = vec4( 0.0 ,0.0 , 1.0, 1.0);
vec4 one = vec4( 1.0 );
vec4 P = vec4((coords),vec2(scale, scale));
vec4 dist = vec4(0.0);
float inorout = -4.0;
dist.x = dot( P, HS0 );
dist.y = dot( P, HS1 );
dist.z = dot( P, HS2 );
dist.w = dot( P, HS3 );
dist = smoothstep( -feather, feather, dist );
inorout += dot( dist, one );
dist.x = dot( P, HS4 );
dist.y = HS5.w - abs( P.z );
dist = smoothstep( -feather, feather, dist );
inorout += dist.x;
return clamp( inorout, 0.0, 1.0 );
}
float bdepth(vec2 coords) //blurring depth
{
float d = 0.0;
float kernel[9];
vec2 offset[9];
vec2 wh = vec2(texel.x, texel.y) * dbsize;
offset[0] = vec2(-wh.x,-wh.y);
offset[1] = vec2( 0.0, -wh.y);
offset[2] = vec2( wh.x -wh.y);
offset[3] = vec2(-wh.x, 0.0);
offset[4] = vec2( 0.0, 0.0);
offset[5] = vec2( wh.x, 0.0);
offset[6] = vec2(-wh.x, wh.y);
offset[7] = vec2( 0.0, wh.y);
offset[8] = vec2( wh.x, wh.y);
kernel[0] = 1.0/16.0; kernel[1] = 2.0/16.0; kernel[2] = 1.0/16.0;
kernel[3] = 2.0/16.0; kernel[4] = 4.0/16.0; kernel[5] = 2.0/16.0;
kernel[6] = 1.0/16.0; kernel[7] = 2.0/16.0; kernel[8] = 1.0/16.0;
for( int i=0; i<9; i++ )
{
float tmp = texture2D(inputImageTexture2, coords + offset[i]).r;
d += tmp * kernel[i];
}
return d;
}
vec3 color(vec2 coords,float blur) //processing the sample
{
vec3 col = vec3(0.0);
col.r = texture2D(inputImageTexture, coords + vec2(0.0,1.0)*texel*fringe*blur).r;
col.g = texture2D(inputImageTexture, coords + vec2(-0.866,-0.5)*texel*fringe*blur).g;
col.b = texture2D(inputImageTexture, coords + vec2(0.866,-0.5)*texel*fringe*blur).b;
vec3 lumcoeff = vec3(0.299,0.587,0.114);
float lum = dot(col.rgb, lumcoeff);
float thresh = max((lum-threshold)*gain, 0.0);
return col+mix(vec3(0.0),col,thresh*blur);
}
vec2 rand(vec2 coord) //generating noise/pattern texture for dithering
{
float noiseX = ((fract(1.0-coord.s*(width/2.0))*0.25)+(fract(coord.t*(height/2.0))*0.75))*2.0-1.0;
float noiseY = ((fract(1.0-coord.s*(width/2.0))*0.75)+(fract(coord.t*(height/2.0))*0.25))*2.0-1.0;
if (noise)
{
noiseX = clamp(fract(sin(dot(coord ,vec2(12.9898,78.233))) * 43758.5453),0.0,1.0)*2.0-1.0;
noiseY = clamp(fract(sin(dot(coord ,vec2(12.9898,78.233)*2.0)) * 43758.5453),0.0,1.0)*2.0-1.0;
}
return vec2(noiseX,noiseY);
}
vec3 debugFocus(vec3 col, float blur, float depth)
{
float edge = 0.002*depth; //distance based edge smoothing
float m = clamp(smoothstep(0.0,edge,blur),0.0,1.0);
float e = clamp(smoothstep(1.0-edge,1.0,blur),0.0,1.0);
col = mix(col,vec3(1.0,1.0,0.0),(1.0-m)*0.6);
col = mix(col,vec3(0.0,1.0,1.0),((1.0-e)-(1.0-m))*0.2);
return col;
}
float linearize(float depth)
{
return -zfar * znear / (depth * (zfar - znear) - zfar);
}
float vignette()
{
float dist = distance(textureCoordinate.xy, vec2(0.5,0.5));
dist = smoothstep(vignout+(fstop/vignfade), vignin+(fstop/vignfade), dist);
return clamp(dist,0.0,1.0);
}
void main()
{
//scene depth calculation
float depth = linearize(texture2D(inputImageTexture2, textureCoordinate2.xy).x);
if (depthblur)
{
depth = linearize(bdepth(textureCoordinate2.xy));
}
//focal plane calculation
float fDepth = focalDepth;
if (autofocus)
{
fDepth = linearize(texture2D(inputImageTexture2, focus).x);
}
//dof blur factor calculation
float blur = 0.0;
if (manualdof)
{
float a = depth-fDepth; //focal plane
float b = (a-fdofstart)/fdofdist; //far DoF
float c = (-a-ndofstart)/ndofdist; //near Dof
blur = (a>0.0)?b:c;
}
else
{
float f = focalLength; //focal length in mm
float d = fDepth*1000.0; //focal plane in mm
float o = depth*1000.0; //depth in mm
float a = (o*f)/(o-f);
float b = (d*f)/(d-f);
float c = (d-f)/(d*fstop*CoC);
blur = abs(a-b)*c;
}
blur = clamp(blur,0.0,1.0);
// calculation of pattern for ditering
vec2 noise = rand(textureCoordinate.xy)*namount*blur;
// getting blur x and y step factor
float w = (1.0/width)*blur*maxblur+noise.x;
float h = (1.0/height)*blur*maxblur+noise.y;
// calculation of final color
vec3 col = vec3(0.0);
if(blur < 0.05) //some optimization thingy
{
col = texture2D(inputImageTexture, textureCoordinate.xy).rgb;
}
else
{
col = texture2D(inputImageTexture, textureCoordinate.xy).rgb;
float s = 1.0;
int ringsamples;
for (int i = 1; i <= rings; i += 1)
{
ringsamples = i * samples;
for (int j = 0 ; j < ringsamples ; j += 1)
{
float step = PI*2.0 / float(ringsamples);
float pw = (cos(float(j)*step)*float(i));
float ph = (sin(float(j)*step)*float(i));
float p = 1.0;
if (pentagon)
{
p = penta(vec2(pw,ph));
}
col += color(textureCoordinate.xy + vec2(pw*w,ph*h),blur)*mix(1.0,(float(i))/(float(rings)),bias)*p;
s += 1.0*mix(1.0,(float(i))/(float(rings)),bias)*p;
}
}
col /= s; //divide by sample count
}
if (showFocus)
{
col = debugFocus(col, blur, depth);
}
if (vignetting)
{
col *= vignette();
}
gl_FragColor.rgb = col;
gl_FragColor.a = 1.0;
}
This is my bokeh filter, a subclass of GPUImageTwoInputFilter:
#implementation GPUImageBokehFilter
- (id)init;
{
NSString *fragmentShaderPathname = [[NSBundle mainBundle] pathForResource:#"BokehShader" ofType:#"fsh"];
NSString *fragmentShaderString = [NSString stringWithContentsOfFile:fragmentShaderPathname encoding:NSUTF8StringEncoding error:nil];
if (!(self = [super initWithFragmentShaderFromString:fragmentShaderString]))
{
return nil;
}
focalDepthUniform = [filterProgram uniformIndex:#"focalDepth"];
focalLengthUniform = [filterProgram uniformIndex:#"focalLength"];
fStopUniform = [filterProgram uniformIndex:#"fstop"];
[self setFocalDepth:1.0];
[self setFocalLength:35.0];
[self setFStop:2.2];
return self;
}
#pragma mark -
#pragma mark Accessors
- (void)setFocalDepth:(float)focalDepth {
_focalDepth = focalDepth;
[self setFloat:_focalDepth forUniform:focalDepthUniform program:filterProgram];
}
- (void)setFocalLength:(float)focalLength {
_focalLength = focalLength;
[self setFloat:_focalLength forUniform:focalLengthUniform program:filterProgram];
}
- (void)setFStop:(CGFloat)fStop {
_fStop = fStop;
[self setFloat:_fStop forUniform:fStopUniform program:filterProgram];
}
#end
And finally, this is how I use said filter:
#implementation ViewController {
GPUImageBokehFilter *bokehFilter;
GPUImagePicture *bokehMap;
UIImage *inputImage;
}
- (void)viewDidLoad
{
[super viewDidLoad];
inputImage = [UIImage imageNamed:#"stones"];
bokehMap = [[GPUImagePicture alloc] initWithImage:[UIImage imageNamed:#"bokehmask"]];
_backgroundImage.image = inputImage;
bokehFilter = [[GPUImageBokehFilter alloc] init];
[self processImage];
}
- (IBAction)dataInputUpdated:(id)sender {
[self processImage];
}
- (void *)processImage {
dispatch_async(dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
GPUImagePicture *gpuPicture = [[GPUImagePicture alloc] initWithImage:inputImage];
[gpuPicture addTarget:bokehFilter];
[gpuPicture processImage];
[bokehMap addTarget:bokehFilter];
[bokehMap processImage];
[bokehFilter useNextFrameForImageCapture];
[bokehFilter setFloat:inputImage.size.width forUniformName:#"inputImageTextureWidth"];
[bokehFilter setFloat:inputImage.size.height forUniformName:#"inputImageTextureHeight"];
UIImage *blurredImage = [bokehFilter imageFromCurrentFramebuffer];
dispatch_async(dispatch_get_main_queue(), ^{
[self displayNewImage:blurredImage];
});
});
}
- (void)displayNewImage:(UIImage*)newImage {
[UIView transitionWithView:_backgroundImage
duration:.6f
options:UIViewAnimationOptionTransitionCrossDissolve
animations:^{
_backgroundImage.image = newImage;
} completion:nil];
}
...
The first image is the one I'm trying to blur, the second one is a random gradient to test the shader's depth map thingy:
When I start the app on my iPhone, I get this:
After moving the slider (which triggers the dataInputChanged method), I get this:
While that admittedly looks much better than the first image, I still have some problems with this:
There's a diagonal noisy line (inside the red lines I put on the picture) that appears to be unblurred.
The top left of the image is blurry, even though it shouldn't be.
Why do I get this weird behavior? Shouldn't the shader output be the same every time?
Also, how do I get it to respect the depth map? My GLSL shader knowledge is very limited, so please be patient.
The diagonal artifact appears to be caused by your test gradient. You can see that it occurs at about the same place as where your gradient goes to completely white. Try spreading out the gradient so it only reaches 1.0 or 0.0 at the very corners of the image.
It's a pretty big question, and I can't make a full answer because I would really need to test the thing out.
But a few points: The final image that you put up is hard to work with. Because the image has been upscaled so much, I can't tell if it's actually blurred or if it just appears blurry because of the resolution. Regardless, the amount of blur that you're getting (when compared to the original link that you provided) suggests that something isn't working with the shader.
Another thing that concerns me is the //some optimization thingy comment that you've got in there. This is the sort of thing that's going to be responsible for an ugly line in your final output. Saying that you wont have any blur under blur < 0.05 isn't necessarily something that you can do! I would be expecting a nasty artifact as the shader transitions from the blur shader and into the 'optimized' part.
Hope that sheds some light, and good luck!
Have you tried enabling showFocus? This should show the focal point in red and the focal range in green which should help with debugging. You could also try enabling autofocus to ensure that the centre of the image is in focus, because at the moment it's not obvious which distance should be in focus, due to the linearize function changing coordinate systems. After that try tweaking fstop to get the desired amount of blur. You will probably also find that you will need greater than samples = 3 and rings = 3 to produce a smooth bokeh effect.
Your answers helped me get on the right track, and after a few hours of fiddling around with my code and the shader, I managed to get all bugs fixed. Here's what caused them and how I fixed them:
The ugly diagonal line was caused by the linearize() method, so I removed it and made the shader use the RGB values (or, to be more precise: only the R value) from the depth map without processing them first.
The blue-ish image I got from the shader was caused by my own incompetence. These two lines had to be put before the calls to processImage:
[bokehFilter setFloat:inputImage.size.width forUniformName:#"inputImageTextureWidth"];
[bokehFilter setFloat:inputImage.size.height forUniformName:#"inputImageTextureHeight"];
In hindsight, it's obvious why I only got results the second time I used the shader. After fixing those bugs, I went on to optimize it a bit to keep the execution time as low as possible, and now I can tell it to render 8 samples/4 rings and it does so in less than a second. Here's what that looks like:
Thanks for the answers, everyone, I probably wouldn't have gotten those bugs fixed without you.

Resources