Floyd Warshall using Alea GPU - f#
I've been trying to use Alea GPU to write the parallel Floyd-Warshall algorithm in F#, and basing myself on the CUDA code another user presented here
The Floyd-Warshall algorithm in CUDA
I wrote the following simple implementation
type FWModule<'T>(target:GPUModuleTarget, tileDim:int) =
inherit GPUModule(target)
[<Kernel;ReflectedDefinition>]
member this.FloydWKernel (width:int) (k:int) (data:deviceptr<float>) =
let col = blockIdx.x * blockDim.x + threadIdx.x
let row = blockIdx.y
if col >= width then () //out of bounds
let index = width * row + col
let best = __shared__.Variable<float>()
if threadIdx.x = 0 then best := data.[width*row+k]
__syncthreads()
let tmp = data.[k*width+col]
let candidate = !best + tmp
data.[index] <- min data.[index] candidate
member this.LaunchParams width =
let blockdim = dim3(tileDim)
let griddim = dim3(divup width tileDim, width)
LaunchParam(griddim, blockdim)
member this.FloydW (width:int) (k:int) (data:deviceptr<float>) =
let lp = this.LaunchParams width
this.GPULaunch <# this.FloydWKernel #> lp width k idata odata
member this.FW(size:int, A:float[])=
use deviceArr = this.GPUWorker.Malloc(A)
for k in 0 .. size-1 do
this.FloydW size k deviceArr.Ptr deviceArr.Ptr
deviceArr.Gather()
let tileDim = 256
let apsp = new FWModule<float>(GPUModuleTarget.DefaultWorker, tileDim)
However, when the following lines are ran in fsi
let m = [|0.0 ; 5.0 ; 9.0 ; infinity;
infinity; 0.0 ; 1.0 ; infinity;
infinity; infinity; 0.0 ; 2.0;
infinity; 3.0 ; infinity; 0.0|];;
apsp.FW (4,m);;
The output is
[|0.0; 5.0; 6.0; 8.0;
4.0; 0.0; 1.0; 3.0;
3.0; 3.0; 0.0; 1.0;
1.0; 1.0; 1.0; 0.0|]
Which it should not be given that the usual iterative, sequential floydwarshall
let floydwarshall (l:int, mat:float[]) =
let a = Array.copy mat
for k in 0 .. (l-1) do
for i in 0 .. (l-1) do
for j in 0 .. (l-1) do
a.[i*l+j] <- min a.[i*l+j] (a.[i*l+k] + a.[k*l+j])
a
gives me
floydwarshall (4,m);;
[|0.0 ; 5.0; 6.0; 8.0;
infinity; 0.0; 1.0; 3.0;
infinity; 5.0; 0.0; 2.0;
infinity; 3.0; 4.0; 0.0|]
My question is, what's happening?
Here is some source code snip from our sample gallery you can find on the Alea GPU sample gallery http://www.quantalea.com/gallery.
Here is the single stage algorithm. It is not the fastest but reasonably simple to understand.
public static class FloydWarshallSingleStage
{
const int BlockWidth = 16;
/// <summary>
/// Kernel for parallel Floyd Warshall algorithm on GPU.
/// </summary>
/// <param name="u">Number vertex of which is performed relaxation paths [v1, v2]</param>
/// <param name="n">Number of vertices in the graph G:=(V,E), n := |V(G)|</param>
/// <param name="d">Matrix of shortest paths d(G)</param>
/// <param name="p">Matrix of predecessors p(G)</param>
public static void KernelSingleStage(int u, int[,] d, int[,] p)
{
var n = d.GetLength(0);
var v1 = blockDim.y * blockIdx.y + threadIdx.y;
var v2 = blockDim.x * blockIdx.x + threadIdx.x;
if (v1 < n && v2 < n)
{
var newPath = d[v1, u] + d[u, v2];
var oldPath = d[v1, v2];
if (oldPath > newPath)
{
d[v1, v2] = newPath;
p[v1, v2] = p[u, v2];
}
}
}
[GpuManaged]
public static void Run(Gpu gpu, int[,] d, int[,] p)
{
var n = d.GetLength(0);
var gridDim = new dim3((n - 1) / BlockWidth + 1, (n - 1) / BlockWidth + 1, 1);
var blockDim = new dim3(BlockWidth, BlockWidth, 1);
var lp = new LaunchParam(gridDim, blockDim);
for (var u = 0; u < n; u++)
{
gpu.Launch(KernelSingleStage, lp, u, d, p);
}
}
}
The multi-stage is more complicated and the listing is longer. Here I paste a version that uses automatic memory management which simplifies the code quite a bit but also has some performance implication. The multi-stage version uses three kernels to complete the job and uses tiling to improve memory access.
public class FloydWarshallMultiStage
{
private const int None = -1;
private const int Inf = 1061109567;
//[GpuParam]
//private readonly Constant<int> BlockSize;
//[GpuParam]
//private readonly Constant<int> ThreadSize;
//[GpuParam]
//private readonly Constant<int> VirtualBlockSize;
private const int BlockSize = 16;
private const int ThreadSize = 2;
private const int VirtualBlockSize = BlockSize*ThreadSize;
public FloydWarshallMultiStage(int blockSize, int threadSize)
{
//BlockSize = new Constant<int>(blockSize);
//ThreadSize = new Constant<int>(threadSize);
//VirtualBlockSize = new Constant<int>(blockSize * threadSize);
}
/// <summary>
/// Kernel for parallel Floyd Warshall algorithm on GPU computing independent blocks.
/// </summary>
/// <param name="block">Number block of which is performed relaxation paths [v1, v2]</param>
/// <param name="n">Number of vertices in the graph G:=(V,E), n := |V(G)|</param>
/// <param name="pitch">Width to get to next row in number of int</param>
/// <param name="d">Matrix of shortest paths d(G)</param>
/// <param name="p">Matrix of predecessors p(G)</param>
public void KernelPhaseOne(int block, int n, int pitch, int[,] d, int[,] p)
{
var newPred = 0;
var tx = threadIdx.x;
var ty = threadIdx.y;
var v1 = VirtualBlockSize*block + ty;
var v2 = VirtualBlockSize*block + tx;
var primaryD = __shared__.Array2D<int>(VirtualBlockSize, VirtualBlockSize);
var primaryP = __shared__.Array2D<int>(VirtualBlockSize, VirtualBlockSize);
if (v1 < n && v2 < n)
{
primaryD[ty, tx] = d[v1, v2];
primaryP[ty, tx] = p[v1, v2];
newPred = primaryP[ty, tx];
}
else
{
primaryD[ty, tx] = Inf;
primaryP[ty, tx] = None;
}
DeviceFunction.SyncThreads();
for (var i = 0; i < VirtualBlockSize; i++)
{
var newPath = primaryD[ty, i] + primaryD[i, tx];
DeviceFunction.SyncThreads();
if (newPath < primaryD[ty, tx])
{
primaryD[ty, tx] = newPath;
newPred = primaryP[i, tx];
}
DeviceFunction.SyncThreads();
primaryP[ty, tx] = newPred;
}
if (v1 < n && v2 < n)
{
d[v1, v2] = primaryD[ty, tx];
p[v1, v2] = primaryP[ty, tx];
}
}
/// <summary>
/// Kernel for parallel Floyd Warshall algorithm on GPU to compute block depending on a single independent block.
/// </summary>
/// <param name="block">Number block of which is performed relaxation paths [v1, v2]</param>
/// <param name="n">Number of vertices in the graph G:=(V,E), n := |V(G)|</param>
/// <param name="pitch"></param>
/// <param name="d">Matrix of shortest paths d(G)</param>
/// <param name="p">Matrix of predecessors p(G)</param>
public void KernelPhaseTwo(int block, int n, int pitch, int[,] d, int[,] p)
{
if (blockIdx.x == block) return;
var newPath = 0;
var newPred = 0;
var tx = threadIdx.x;
var ty = threadIdx.y;
var v1 = VirtualBlockSize*block + ty;
var v2 = VirtualBlockSize*block + tx;
var primaryD = __shared__.Array2D<int>(VirtualBlockSize, VirtualBlockSize);
var currentD = __shared__.Array2D<int>(VirtualBlockSize, VirtualBlockSize);
var primaryP = __shared__.Array2D<int>(VirtualBlockSize, VirtualBlockSize);
var currentP = __shared__.Array2D<int>(VirtualBlockSize, VirtualBlockSize);
if (v1 < n && v2 < n)
{
primaryD[ty, tx] = d[v1, v2];
primaryP[ty, tx] = p[v1, v2];
}
else
{
primaryD[ty, tx] = Inf;
primaryP[ty, tx] = None;
}
// load i-aligned singly dependent blocks
if (blockIdx.y == 0)
{
v1 = VirtualBlockSize*block + ty;
v2 = VirtualBlockSize*blockIdx.x + tx;
}
// load j-aligned singly dependent blocks
else
{
v1 = VirtualBlockSize*blockIdx.x + ty;
v2 = VirtualBlockSize*block + tx;
}
if (v1 < n && v2 < n)
{
currentD[ty, tx] = d[v1, v2];
currentP[ty, tx] = p[v1, v2];
newPred = currentP[ty, tx];
}
else
{
currentD[ty, tx] = Inf;
currentP[ty, tx] = None;
}
DeviceFunction.SyncThreads();
// compute i-aligned singly dependent blocks
if (blockIdx.y == 0)
{
for (var i = 0; i < VirtualBlockSize; i++)
{
newPath = primaryD[ty, i] + currentD[i, tx];
DeviceFunction.SyncThreads();
if (newPath < currentD[ty, tx])
{
currentD[ty, tx] = newPath;
newPred = currentP[i, tx];
}
DeviceFunction.SyncThreads();
currentP[ty, tx] = newPred;
}
}
// compute j-aligned singly dependent blocks
else
{
for (var i = 0; i < VirtualBlockSize; i++)
{
newPath = currentD[ty, i] + primaryD[i, tx];
DeviceFunction.SyncThreads();
if (newPath < currentD[ty, tx])
{
currentD[ty, tx] = newPath;
currentP[ty, tx] = primaryP[i, tx];
}
DeviceFunction.SyncThreads();
}
}
if (v1 < n && v2 < n)
{
d[v1, v2] = currentD[ty, tx];
p[v1, v2] = currentP[ty, tx];
}
}
/// <summary>
/// Kernel for parallel Floyd Warshall algorithm on GPU to compute dependent block depending on the singly dependent blocks.
/// </summary>
/// <param name="block">Number block of which is performed relaxation paths [v1, v2]</param>
/// <param name="n">Number of vertices in the graph G:=(V,E), n := |V(G)|</param>
/// <param name="pitch"></param>
/// <param name="d">Matrix of shortest paths d(G)</param>
/// <param name="p">Matrix of predecessors p(G)</param>
public void KernelPhaseThree(int block, int n, int pitch, int[,] d, int[,] p)
{
if (blockIdx.x == block || blockIdx.y == block) return;
var tx = threadIdx.x*ThreadSize;
var ty = threadIdx.y*ThreadSize;
var v1 = blockDim.y*blockIdx.y*ThreadSize + ty;
var v2 = blockDim.x*blockIdx.x*ThreadSize + tx;
var primaryRowD = __shared__.Array2D<int>(BlockSize*ThreadSize, BlockSize*ThreadSize);
var primaryColD = __shared__.Array2D<int>(BlockSize*ThreadSize, BlockSize*ThreadSize);
var primaryRowP = __shared__.Array2D<int>(BlockSize*ThreadSize, BlockSize*ThreadSize);
var v1Row = BlockSize*block*ThreadSize + ty;
var v2Col = BlockSize*block*ThreadSize + tx;
// load data for virtual block
for (var i = 0; i < ThreadSize; i++)
{
for (var j = 0; j < ThreadSize; j++)
{
var idx = tx + j;
var idy = ty + i;
if (v1Row + i < n && v2 + j < n)
{
primaryRowD[idy, idx] = d[v1Row + i, v2 + j];
primaryRowP[idy, idx] = p[v1Row + i, v2 + j];
}
else
{
primaryRowD[idy, idx] = Inf;
primaryRowP[idy, idx] = None;
}
if (v1 + i < n && v2Col + j < n)
{
primaryColD[idy, idx] = d[v1 + i, v2Col + j];
}
else
{
primaryColD[idy, idx] = Inf;
}
}
}
DeviceFunction.SyncThreads();
// compute data for virtual block
for (var i = 0; i < ThreadSize; i++)
{
for (var j = 0; j < ThreadSize; j++)
{
if (v1 + i < n && v2 + j < n)
{
var path = d[v1 + i, v2 + j];
var predecessor = p[v1 + i, v2 + j];
var idy = ty + i;
var idx = tx + j;
for (var k = 0; k < BlockSize*ThreadSize; k++)
{
var newPath = primaryColD[idy, k] + primaryRowD[k, idx];
if (path > newPath)
{
path = newPath;
predecessor = primaryRowP[k, idx];
}
}
d[v1 + i, v2 + j] = path;
p[v1 + i, v2 + j] = predecessor;
}
}
}
}
/// <summary>
/// Parallel multi-stage Floyd Warshall algorithm on GPU.
/// </summary>
/// <param name="gpu">The GPU on which the kernels should run</param>
/// <param name="n">Number of vertices in the graph G:=(V,E), n := |V(G)|</param>
/// <param name="g">The graph G:=(V,E)</param>
/// <param name="d">Matrix of shortest paths d(G)</param>
/// <param name="p">Matrix of predecessors p(G)</param>
public void Run(Gpu gpu, int[,] d, int[,] p, bool verbose = false)
{
var n = d.GetLength(0);
var gridDim1 = new dim3(1, 1, 1);
var gridDim2 = new dim3((n - 1)/VirtualBlockSize + 1, 2, 1);
var gridDim3 = new dim3((n - 1)/VirtualBlockSize + 1, (n - 1)/VirtualBlockSize + 1, 1);
var blockDim1 = new dim3(VirtualBlockSize, VirtualBlockSize, 1);
var blockDim2 = new dim3(VirtualBlockSize, VirtualBlockSize, 1);
var blockDim3 = new dim3(BlockSize, BlockSize, 1);
var numOfBlock = (n - 1)/VirtualBlockSize + 1;
var pitchInt = n;
if (verbose)
{
Console.WriteLine($"|V| {n}");
Console.WriteLine($"Phase 1: grid dim {gridDim1} block dim {blockDim1}");
Console.WriteLine($"Phase 2: grid dim {gridDim2} block dim {blockDim2}");
Console.WriteLine($"Phase 3: grid dim {gridDim3} block dim {blockDim3}");
}
for (var block = 0; block < numOfBlock; block++)
{
gpu.Launch(KernelPhaseOne, new LaunchParam(gridDim1, blockDim1), block, n, pitchInt, d, p);
gpu.Launch(KernelPhaseTwo, new LaunchParam(gridDim2, blockDim2), block, n, pitchInt, d, p);
gpu.Launch(KernelPhaseThree, new LaunchParam(gridDim3, blockDim3), block, n, pitchInt, d, p);
}
}
}
The version with explicit memory management you better download the sample from http://www.quantalea.com/gallery and search for Floyd-Warshall.
Hope that answers the question.
The implementation is based on the following paper:
Ben Lund, Justin W Smith, A Multi-Stage CUDA Kernel for Floyd-Warshall, 2010.
https://arxiv.org/abs/1001.4108
Related
Draw a sphere using sectors and stack WebGL
I'm trying to draw a sphere using sectors and stacks algorithm but it output nothing and do not know where is the problem. Any help? I implemented the algorithm literally as written in: http://www.songho.ca/opengl/gl_sphere.html Everything is working fine except the coloredShpere function this is a photo of what appears to me when I run this function: and you can find the whole code in: https://drive.google.com/open?id=1dnnkk1w7oq4O7hPTMeGRkyELwi4tcl5X let mesh = createMesh(gl); const PI = 3.1415926; const r = 1.0; const stackCount = 16; const sectorCount = 16; let x : number; let y : number; let z : number; let xy : number; let vertices: number[] = new Array(); let normals : number[] = new Array(); let texCoords : number[] = new Array(); let nx: number; let ny: number; let nz: number; let lengthInv: number; lengthInv = 1.0 / r; let s: number; let t: number; let sectorStep = 2 * PI / sectorCount; let stackStep = PI / stackCount; let sectorAngle : number; let stackAngle : number; for(let i = 0; i<=stackCount; i++) { stackAngle = PI/2 - i*stackStep; //-90 to +90 xy = r*Math.cos(stackAngle); z = r*Math.sin(stackAngle); for(let j = 0; j<=sectorCount; j++) { sectorAngle = j*sectorAngle; //0 to 360 x = xy*Math.cos(sectorAngle); y = xy*Math.sin(sectorAngle); vertices.push(x); vertices.push(y); vertices.push(z); nx = x * lengthInv; ny = y * lengthInv; nz = z * lengthInv; normals.push(nx); normals.push(ny); normals.push(nz); // vertex tex coord (s, t) range between [0, 1] s = j / sectorCount; t = i / stackCount; texCoords.push(s); texCoords.push(t); } } // generate CCW index list of sphere triangles // indices // k1--k1+1 // | / | // | / | // k2--k2+1 let indices: number[] = new Array(); let k1 : number; let k2 : number; for(let i = 0; i<stackCount; i++) { k1 = i * (sectorCount + 1); //frist stack k2 = k1 + sectorCount + 1; //second stack for(let j = 0; j<sectorCount; j++) { //k1, k2, k1+1 if(i != 0) { indices.push(k1); indices.push(k2); indices.push(k1+1); } //k1+1, k2, k2+1 if(i != (stackCount-1)) { indices.push(k1+1); indices.push(k2); indices.push(k2+1); } } } mesh.setBufferData("positions", new Float32Array(vertices), gl.STATIC_DRAW); //mesh.setBufferData("colors", new Uint8Array(), gl.STATIC_DRAW); mesh.setElementsData(new Uint32Array(indices), gl.STATIC_DRAW); //mesh.setBufferData("colors", new Uint8Array(), gl.STATIC_DRAW); return mesh;
Suggestion: Learn how to use console.log and your browser's debugger I didn't check if your code actually works or not but I did add these lines at the bottom of what you posted above console.log(vertices); console.log(indices); and what I saw was All those NaN values are clearly wrong Stepping through the code comes to this line sectorAngle = j*sectorAngle; //0 to 360 which is where the NaN is generated which doesn't match the article you linked to sectorAngle = j * sectorStep; // starting from 0 to 2pi Whether or not that's the only issue I don't know but if there are more then use console.log and the debugger to help find the issue. One way to make the code easier to debug is set stackCount and sectorCount to something small like 4 and 2 respectively and then you should have some idea what all the values should be and you can compare with what values you are getting.
If someone interested to know the solution, this is the code after some improvements: ` let mesh = createMesh(gl); const PI = 3.1415926; const r = 1.0; let vertices = []; let colors = []; for(let i = 0; i<=verticalResolution; ++i) { let theta = i * Math.PI / verticalResolution; //-90 to 90 let sinTheta = Math.sin(theta); let cosTheta = Math.cos(theta); for(let j = 0; j<=horizontalResolution; ++j) { let phi = j * 2 * Math.PI / horizontalResolution; //0 to 360 let sinPhi = Math.sin(phi); let cosPhi = Math.cos(phi); let x = sinTheta*cosPhi; let y = cosTheta; let z = sinTheta*sinPhi; vertices.push(r*x); vertices.push(r*y); vertices.push(r*z); colors.push((x+1)/2*255); colors.push((y+1)/2*255); colors.push((z+1)/2*255); colors.push(255); } } // generate CCW index list of sphere triangles // indices // k1--k1+1 // | / | // | / | // k2--k2+1 let indices = []; for(let i = 0; i<verticalResolution; ++i) { for(let j = 0; j<horizontalResolution; ++j) { let first = (i * (horizontalResolution + 1)) + j; let second = first + horizontalResolution + 1; indices.push(first); indices.push(second); indices.push(first+1); indices.push(second); indices.push(second+1); indices.push(first+1); } } mesh.setBufferData("positions", new Float32Array(vertices), gl.STATIC_DRAW); mesh.setBufferData("colors", new Uint8Array(colors), gl.STATIC_DRAW); mesh.setElementsData(new Uint32Array(indices), gl.STATIC_DRAW); return mesh;` Output of the code
Is there a way to call 4 APIs and then create a list and draw a Pie Chart
I need to call 4 APIs on a same server and then use the result to create a list and want to pass the same list to create a pie chart. I have created a list but unable to pass that list in pie chart. main() async { // returned dataset example: // [{females: 1367341, country: Brazil, age: 18, males: 1368729, year: 1980, total: 2736070}] final age18data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/18/'); final age30data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/30/'); final age45data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/45/'); final age60data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/60/'); final values = [ age18data[0]["total"], age30data[0]["total"], age45data[0]["total"], age60data[0]["total"] ]; I have done till here now I want to use these values list to draw the pie chart using charts_flutter package
Here you go, 4 api calls to same server, and a pie chart: import "dart:math" as math; import "dart:io"; import "dart:convert"; main() async { // returned dataset example: // [{females: 1367341, country: Brazil, age: 18, males: 1368729, year: 1980, total: 2736070}] final age18data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/18/'); final age30data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/30/'); final age45data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/45/'); final age60data = await getJson( 'http://api.population.io:80/1.0/population/2019/India/60/'); final values = [ age18data[0]["total"], age30data[0]["total"], age45data[0]["total"], age60data[0]["total"] ]; final allTotal = values[0] + values[1] + values[2] + values[3]; final proportion = values.map((v) => v / allTotal).toList(); print("Population of India:"); print("A - 18 y.o. ${values[0]} (${proportion[0]})"); print("B - 25 y.o. ${values[1]} (${proportion[1]})"); print("C - 45 y.o. ${values[2]} (${proportion[2]})"); print("D - 60 y.o. ${values[3]} (${proportion[3]})"); final labels = ["A", "B", "C", "D"]; asciiPieChart(labels, proportion); } Future<dynamic> getJson(String url) async { var request = await HttpClient().getUrl(Uri.parse(url)); // produces a request object var response = await request.close(); // sends the request var body = await response.transform(Utf8Decoder()).join(""); return json.decode(body); } void asciiPieChart(dynamic k, dynamic v) { // adapted from javascript version: // https://codegolf.stackexchange.com/a/23351/18464 dynamic d, y, s, x, r, a, i, f, p, t, j; r = 10.0; d = r * 2; p = []; for (y = 0; y < d; y++) { p.add([]); for (x = 0; x < d; x++) p[y].add(" "); } t = 0; i = -1; for (f = 0; f < 1; f += 1 / (r * 20)) { if (f > t) t += v[++i]; a = math.pi * 2 * f; for (j = 0; j < r; j++) { int px = ((math.sin(a) * j).round() + r).toInt(); int py = ((math.cos(a) * j).round() + r).toInt(); p[px][py] = k[i < 0 ? k.length + i : i]; } } s = ""; for (y = 0; y < d; y++) { for (x = 0; x < d; x++) s += p[y][x]; s += "\n"; } print(s); } Run dart example.dart prints out: Population of India: A - 18 y.o. 25026690 (0.33671242865945705) B - 25 y.o. 22643410 (0.30464746133954734) C - 45 y.o. 16325200 (0.21964142043359983) A - 60 y.o. 10331300 (0.13899868956739578) CCCCCCC CCCCCCCCCCC CCCCCCCCCCCCD BBCCCCCCCCCCDDD BBBBCCCCCCCCDDDDD BBBBBCCCCCCDDDDDD BBBBBBBCCCCCDDDDDDD BBBBBBBCCCDDDDDDDDD BBBBBBBBCCDDDDDDDDD BBBBBBBBBDDDDDDDDDD BBBBBBBBBAAAAAAAAAA BBBBBBBBBAAAAAAAAAA BBBBBBBBAAAAAAAAAAA BBBBBBAAAAAAAAAAA BBBBBBAAAAAAAAAAA BBBBAAAAAAAAAAA BBBAAAAAAAAAA BAAAAAAAAAA AAAAAAA You can of-course apply same ideas and use different charting method for example as described in https://google.github.io/charts/flutter/example/pie_charts/donut.html Doing http requests also is easier with https://pub.dartlang.org/packages/http
Why at high circle division, the cone not complete itself?
I created in webgl a javascript file that needs to draw a cone. If I choose a low circle division, it work perfectly, and all the lines are displayed. But at high one, high from 255, it breaks. It seems not all of the vertices are linked. I can not understand why the difference. case 'cone':{ var CONE_DIV = 255; var angolo = 360 / CONE_DIV; var altezza = 1.0; // Coordinates var vertices = []; vertices.push(0.0); //v0t X vertices.push(0.0); //v0t Y vertices.push(0.0); //v0t Z vertices.push(0.0); //v0 X vertices.push(altezza); //v0 Y vertices.push(0.0); //v0 Z for (var i = 0; i < CONE_DIV; i++) { var ai = i * 2 * Math.PI / CONE_DIV; var si = Math.sin(ai); var ci = Math.cos(ai); vertices.push(ci); //coordinate X vertices.push(0.0); //coordinate Y vertices.push(si); //coordinate Z } // Colors var colors = []; for (var k = 0; k < CONE_DIV; k++) { for (var t = 0; t < 2; t++) { colors.push(a); colors.push(b); colors.push(c); } } // Indices of the vertices var indices = []; //index high vertex for (var j = 1; j <= CONE_DIV; j++) { indices.push(1); indices.push(j+1); var l = j + 2; //last vertex base - return to first if (l == CONE_DIV + 2) { indices.push(2); } else { indices.push(l); } } //index base for (var j = 1; j <= CONE_DIV; j++) { indices.push(0); indices.push(j+1); var l = j+2; if (l == CONE_DIV + 2) { //last vertex base - return to first indices.push(2); } else { indices.push(l); } }
Multi otsu(multi-thresholding) with openCV
I am trying to carry out multi-thresholding with otsu. The method I am using currently is actually via maximising the between class variance, I have managed to get the same threshold value given as that by the OpenCV library. However, that is just via running otsu method once. Documentation on how to do multi-level thresholding or rather recursive thresholding is rather limited. Where do I do after obtaining the original otsu's value? Would appreciate some hints, I been playing around with the code, adding one external for loop, but the next value calculated is always 254 for any given image:( My code if need be: //compute histogram first cv::Mat imageh; //image edited to grayscale for histogram purpose //imageh=image; //to delete and uncomment below; cv::cvtColor(image, imageh, CV_BGR2GRAY); int histSize[1] = {256}; // number of bins float hranges[2] = {0.0, 256.0}; // min andax pixel value const float* ranges[1] = {hranges}; int channels[1] = {0}; // only 1 channel used cv::MatND hist; // Compute histogram calcHist(&imageh, 1, channels, cv::Mat(), hist, 1, histSize, ranges); IplImage* im = new IplImage(imageh);//assign the image to an IplImage pointer IplImage* finalIm = cvCreateImage(cvSize(im->width, im->height), IPL_DEPTH_8U, 1); double otsuThreshold= cvThreshold(im, finalIm, 0, 255, cv::THRESH_BINARY | cv::THRESH_OTSU ); cout<<"opencv otsu gives "<<otsuThreshold<<endl; int totalNumberOfPixels= imageh.total(); cout<<"total number of Pixels is " <<totalNumberOfPixels<< endl; float sum = 0; for (int t=0 ; t<256 ; t++) { sum += t * hist.at<float>(t); } cout<<"sum is "<<sum<<endl; float sumB = 0; //sum of background int wB = 0; // weight of background int wF = 0; //weight of foreground float varMax = 0; int threshold = 0; //run an iteration to find the maximum value of the between class variance(as between class variance shld be maximise) for (int t=0 ; t<256 ; t++) { wB += hist.at<float>(t); // Weight Background if (wB == 0) continue; wF = totalNumberOfPixels - wB; // Weight Foreground if (wF == 0) break; sumB += (float) (t * hist.at<float>(t)); float mB = sumB / wB; // Mean Background float mF = (sum - sumB) / wF; // Mean Foreground // Calculate Between Class Variance float varBetween = (float)wB * (float)wF * (mB - mF) * (mB - mF); // Check if new maximum found if (varBetween > varMax) { varMax = varBetween; threshold = t; } } cout<<"threshold value is: "<<threshold;
To extend Otsu's thresholding method to multi-level thresholding the between class variance equation becomes: Please check out Deng-Yuan Huang, Ta-Wei Lin, Wu-Chih Hu, Automatic Multilevel Thresholding Based on Two-Stage Otsu's Method with Cluster Determination by Valley Estimation, Int. Journal of Innovative Computing, 2011, 7:5631-5644 for more information. http://www.ijicic.org/ijicic-10-05033.pdf Here is my C# implementation of Otsu Multi for 2 thresholds: /* Otsu (1979) - multi */ Tuple < int, int > otsuMulti(object sender, EventArgs e) { //image histogram int[] histogram = new int[256]; //total number of pixels int N = 0; //accumulate image histogram and total number of pixels foreach(int intensity in image.Data) { if (intensity != 0) { histogram[intensity] += 1; N++; } } double W0K, W1K, W2K, M0, M1, M2, currVarB, optimalThresh1, optimalThresh2, maxBetweenVar, M0K, M1K, M2K, MT; optimalThresh1 = 0; optimalThresh2 = 0; W0K = 0; W1K = 0; M0K = 0; M1K = 0; MT = 0; maxBetweenVar = 0; for (int k = 0; k <= 255; k++) { MT += k * (histogram[k] / (double) N); } for (int t1 = 0; t1 <= 255; t1++) { W0K += histogram[t1] / (double) N; //Pi M0K += t1 * (histogram[t1] / (double) N); //i * Pi M0 = M0K / W0K; //(i * Pi)/Pi W1K = 0; M1K = 0; for (int t2 = t1 + 1; t2 <= 255; t2++) { W1K += histogram[t2] / (double) N; //Pi M1K += t2 * (histogram[t2] / (double) N); //i * Pi M1 = M1K / W1K; //(i * Pi)/Pi W2K = 1 - (W0K + W1K); M2K = MT - (M0K + M1K); if (W2K <= 0) break; M2 = M2K / W2K; currVarB = W0K * (M0 - MT) * (M0 - MT) + W1K * (M1 - MT) * (M1 - MT) + W2K * (M2 - MT) * (M2 - MT); if (maxBetweenVar < currVarB) { maxBetweenVar = currVarB; optimalThresh1 = t1; optimalThresh2 = t2; } } } return new Tuple(optimalThresh1, optimalThresh2); } And this is the result I got by thresholding an image scan of soil with the above code: (T1 = 110, T2 = 147). Otsu's original paper: "Nobuyuki Otsu, A Threshold Selection Method from Gray-Level Histogram, IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9:62-66" also briefly mentions the extension to Multithresholding. https://engineering.purdue.edu/kak/computervision/ECE661.08/OTSU_paper.pdf Hope this helps.
Here is a simple general approach for 'n' thresholds in python (>3.0) : # developed by- SUJOY KUMAR GOSWAMI # source paper- https://people.ece.cornell.edu/acharya/papers/mlt_thr_img.pdf import cv2 import numpy as np import math img = cv2.imread('path-to-image') img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) a = 0 b = 255 n = 6 # number of thresholds (better choose even value) k = 0.7 # free variable to take any positive value T = [] # list which will contain 'n' thresholds def sujoy(img, a, b): if a>b: s=-1 m=-1 return m,s img = np.array(img) t1 = (img>=a) t2 = (img<=b) X = np.multiply(t1,t2) Y = np.multiply(img,X) s = np.sum(X) m = np.sum(Y)/s return m,s for i in range(int(n/2-1)): img = np.array(img) t1 = (img>=a) t2 = (img<=b) X = np.multiply(t1,t2) Y = np.multiply(img,X) mu = np.sum(Y)/np.sum(X) Z = Y - mu Z = np.multiply(Z,X) W = np.multiply(Z,Z) sigma = math.sqrt(np.sum(W)/np.sum(X)) T1 = mu - k*sigma T2 = mu + k*sigma x, y = sujoy(img, a, T1) w, z = sujoy(img, T2, b) T.append(x) T.append(w) a = T1+1 b = T2-1 k = k*(i+1) T1 = mu T2 = mu+1 x, y = sujoy(img, a, T1) w, z = sujoy(img, T2, b) T.append(x) T.append(w) T.sort() print(T) For full paper and more informations visit this link.
I've written an example on how otsu thresholding work in python before. You can see the source code here: https://github.com/subokita/Sandbox/blob/master/otsu.py In the example there's 2 variants, otsu2() which is the optimised version, as seen on Wikipedia page, and otsu() which is more naive implementation based on the algorithm description itself. If you are okay in reading python codes (in this case, they are pretty simple, almost pseudo code like), you might want to look at otsu() in the example and modify it. Porting it to C++ code is not hard either.
#Antoni4 gives the best answer in my opinion and it's very straight forward to increase the number of levels. This is for three-level thresholding: #include "Shadow01-1.cuh" void multiThresh(double &optimalThresh1, double &optimalThresh2, double &optimalThresh3, cv::Mat &imgHist, cv::Mat &src) { double W0K, W1K, W2K, W3K, M0, M1, M2, M3, currVarB, maxBetweenVar, M0K, M1K, M2K, M3K, MT; unsigned char *histogram = (unsigned char*)(imgHist.data); int N = src.rows*src.cols; W0K = 0; W1K = 0; M0K = 0; M1K = 0; MT = 0; maxBetweenVar = 0; for (int k = 0; k <= 255; k++) { MT += k * (histogram[k] / (double) N); } for (int t1 = 0; t1 <= 255; t1++) { W0K += histogram[t1] / (double) N; //Pi M0K += t1 * (histogram[t1] / (double) N); //i * Pi M0 = M0K / W0K; //(i * Pi)/Pi W1K = 0; M1K = 0; for (int t2 = t1 + 1; t2 <= 255; t2++) { W1K += histogram[t2] / (double) N; //Pi M1K += t2 * (histogram[t2] / (double) N); //i * Pi M1 = M1K / W1K; //(i * Pi)/Pi W2K = 1 - (W0K + W1K); M2K = MT - (M0K + M1K); if (W2K <= 0) break; M2 = M2K / W2K; W3K = 0; M3K = 0; for (int t3 = t2 + 1; t3 <= 255; t3++) { W2K += histogram[t3] / (double) N; //Pi M2K += t3 * (histogram[t3] / (double) N); // i*Pi M2 = M2K / W2K; //(i*Pi)/Pi W3K = 1 - (W1K + W2K); M3K = MT - (M1K + M2K); M3 = M3K / W3K; currVarB = W0K * (M0 - MT) * (M0 - MT) + W1K * (M1 - MT) * (M1 - MT) + W2K * (M2 - MT) * (M2 - MT) + W3K * (M3 - MT) * (M3 - MT); if (maxBetweenVar < currVarB) { maxBetweenVar = currVarB; optimalThresh1 = t1; optimalThresh2 = t2; optimalThresh3 = t3; } } } } }
#Guilherme Silva Your code has a BUG You Must Replace: W3K = 0; M3K = 0; with W2K = 0; M2K = 0; and W3K = 1 - (W1K + W2K); M3K = MT - (M1K + M2K); with W3K = 1 - (W0K + W1K + W2K); M3K = MT - (M0K + M1K + M2K); ;-) Regards EDIT(1): [Toby Speight] I discovered this bug by applying the effect to the same picture at different resoultions(Sizes) and seeing that the output results were to much different from each others (Even changing resolution a little bit) W3K and M3K must be the totals minus the Previous WKs and MKs. (I thought about this for Code-similarity with the one with one level less) At the moment due to my lacks of English I cannot explain Better How and Why To be honest I'm still not 100% sure that this way is correct, even thought from my outputs I could tell that it gives better results. (Even with 1 Level more (5 shades of gray)) You could try yourself ;-) Sorry My Outputs: 3 Thresholds 4 Thresholds
I found a useful piece of code in this thread. I was looking for a multi-level Otsu implementation for double/float images. So, I tried to generalize example for N-levels with double/float matrix as input. In my code below I am using armadillo library as dependency. But this code can be easily adapted for standard C++ arrays, just replace vec, uvec objects with single dimensional double and integer arrays, mat and umat with two-dimensional. Two other functions from armadillo used here are: vectorise and hist. // Input parameters: // map - input image (double matrix) // mask - region of interest to be thresholded // nBins - number of bins // nLevels - number of Otsu thresholds #include <armadillo> #include <algorithm> #include <vector> mat OtsuFilterMulti(mat map, int nBins, int nLevels) { mat mapr; // output thresholded image mapr = zeros<mat>(map.n_rows, map.n_cols); unsigned int numElem = 0; vec threshold = zeros<vec>(nLevels); vec q = zeros<vec>(nLevels + 1); vec mu = zeros<vec>(nLevels + 1); vec muk = zeros<vec>(nLevels + 1); uvec binv = zeros<uvec>(nLevels); if (nLevels <= 1) return mapr; numElem = map.n_rows*map.n_cols; uvec histogram = hist(vectorise(map), nBins); double maxval = map.max(); double minval = map.min(); double odelta = (maxval - abs(minval)) / nBins; // distance between histogram bins vec oval = zeros<vec>(nBins); double mt = 0, variance = 0.0, bestVariance = 0.0; for (int ii = 0; ii < nBins; ii++) { oval(ii) = (double)odelta*ii + (double)odelta*0.5; // centers of histogram bins mt += (double)ii*((double)histogram(ii)) / (double)numElem; } for (int ii = 0; ii < nLevels; ii++) { binv(ii) = ii; } double sq, smuk; int nComb; nComb = nCombinations(nBins,nLevels); std::vector<bool> v(nBins); std::fill(v.begin(), v.begin() + nLevels, true); umat ibin = zeros<umat>(nComb, nLevels); // indices from combinations will be stored here int cc = 0; int ci = 0; do { for (int i = 0; i < nBins; ++i) { if(ci==nLevels) ci=0; if (v[i]) { ibin(cc,ci) = i; ci++; } } cc++; } while (std::prev_permutation(v.begin(), v.end())); uvec lastIndex = zeros<uvec>(nLevels); // Perform operations on pre-calculated indices for (int ii = 0; ii < nComb; ii++) { for (int jj = 0; jj < nLevels; jj++) { smuk = 0; sq = 0; if (lastIndex(jj) != ibin(ii, jj) || ii == 0) { q(jj) += double(histogram(ibin(ii, jj))) / (double)numElem; muk(jj) += ibin(ii, jj)*(double(histogram(ibin(ii, jj)))) / (double)numElem; mu(jj) = muk(jj) / q(jj); q(jj + 1) = 0.0; muk(jj + 1) = 0.0; if (jj>0) { for (int kk = 0; kk <= jj; kk++) { sq += q(kk); smuk += muk(kk); } q(jj + 1) = 1 - sq; muk(jj + 1) = mt - smuk; mu(jj + 1) = muk(jj + 1) / q(jj + 1); } if (jj>0 && jj<(nLevels - 1)) { q(jj + 1) = 0.0; muk(jj + 1) = 0.0; } lastIndex(jj) = ibin(ii, jj); } } variance = 0.0; for (int jj = 0; jj <= nLevels; jj++) { variance += q(jj)*(mu(jj) - mt)*(mu(jj) - mt); } if (variance > bestVariance) { bestVariance = variance; for (int jj = 0; jj<nLevels; jj++) { threshold(jj) = oval(ibin(ii, jj)); } } } cout << "Optimized thresholds: "; for (int jj = 0; jj<nLevels; jj++) { cout << threshold(jj) << " "; } cout << endl; for (unsigned int jj = 0; jj<map.n_rows; jj++) { for (unsigned int kk = 0; kk<map.n_cols; kk++) { for (int ll = 0; ll<nLevels; ll++) { if (map(jj, kk) >= threshold(ll)) { mapr(jj, kk) = ll+1; } } } } return mapr; } int nCombinations(int n, int r) { if (r>n) return 0; if (r*2 > n) r = n-r; if (r == 0) return 1; int ret = n; for( int i = 2; i <= r; ++i ) { ret *= (n-i+1); ret /= i; } return ret; }
Multiple color tweens on Sprites
I try to make a grid of squares, where I could control every square's color parameters individually ie. make them flash one by one or all at the same time. I'm trying to do it with tweens, running the parameters with for-loop. The code below tries to flash all the squares at the same time, once in every second. but for some reason all of the squares don't tween, only some do. Or they tween partly and sometimes they don't tween at all. However, the pattern doesn't repeat itself. Is this too many tweens at the same time? Is the for-loop right way to do this? Should I use MovieClips instead of Sprites? If I want to control colors in many differeng objects in a very fast phase, what would be the best way to do it? import fl.transitions.Tween; import fl.transitions.easing.*; import fl.transitions.TweenEvent; import flash.display.*; import flash.events.*; import flash.display.Sprite; import flash.geom.Rectangle; import flash.geom.ColorTransform; import Math; import flash.utils.Timer; import flash.events.TimerEvent; import resolumeCom.*; import resolumeCom.parameters.*; import resolumeCom.events.*; public class LightGrid extends MovieClip { private var t1:Tween; private var resolume:Resolume = new Resolume(); private var tempo:FloatParameter = resolume.addFloatParameter("Tempo", 0.6); private var pad = 3; private var dim = 20; private var posX = 0 + pad; private var posY = 0 + pad; private var a:Number = new Number(); private var b:Number = new Number(); private var blk:Number = new Number(); var newCol:ColorTransform = new ColorTransform(); public function LightGrid() { resolume.addParameterListener(parameterChanged); for (var b = 0; b < 16; b++) { posY = (b*dim) + (b*pad) + pad; trace("New row"); for (var a = 0; a < 24; a++) { posX = (a*dim) + (a*pad) + pad; // l = line, f = fill var l:Sprite = new Sprite; l.graphics.lineStyle(2, 0xFFFFF, 1); l.graphics.drawRect(posX, posY, dim, dim); l.name = "line_Row" + b + "Col" + a; addChild(l); var f:Sprite = new Sprite; f.graphics.beginFill(0x990000, 1); f.graphics.drawRect(posX, posY, dim, dim); f.graphics.endFill(); f.name = "fill_Row" + b + "Col" + a; addChild(f); trace(getChildByName("fill_Row" + b + "Col" + a).name); } } var myTimer:Timer = new Timer(1000, 100); myTimer.addEventListener("timer", timerHandler); myTimer.start(); } public function timerHandler(event:TimerEvent):void { flashTheLights(); } public function parameterChanged(e:ChangeEvent):void { if (e.object == tempo) { } } public function flashTheLights():void { blk = 0; for (var blk = 0; blk < (24/3); blk++) { for (var d = 0; d < 16; d++) { for (var c = (0+(3*blk)); c < (3+(3*blk)); c++) { newCol.redOffset=30-(35*blk); newCol.blueOffset=200+(7*blk); newCol.greenOffset=200; trace(getChildByName("fill_Row" + d + "Col" + c).name); var fill:Sprite = getChildByName("fill_Row" + d + "Col" + c) as Sprite; fill.transform.colorTransform.alphaMultiplier = -255; fill.transform.colorTransform = newCol; trace("Run tween"); var myTween = new Tween(fill,'alpha',Regular.easeIn,1,0,0.3,true); } } trace("Done!" + blk); } } }
I kinda solved the problem by stacking the sprites under movieclips so I only tween couple of elements instead of tens or hundreds for (var k = 0; k < (grdX/ptrnSz); k++) { var ptrn:MovieClip = new MovieClip(); ptrn.name = "ptrn" + k; addChild(ptrn); ptrn.alpha = 0.01; ptrnAm++; for (var d = 0; d < grdY; d++) { posY = (d*dim) + (d*pad) + top; for (var c = (0+(ptrnSz*k)); c < (ptrnSz+(ptrnSz*k)); c++) { posX = (c*dim) + (c*pad) + left; // l = line, f = fill var f:Sprite = new Sprite; f.graphics.beginFill(0xFFFFFF, 1); f.graphics.drawRect(posX-0.5, posY, dim, dim); f.graphics.endFill(); f.name = "fill_Block" + k + "Row" + d + "Col" + c; ptrn.addChild(f); } } } grdX = Grid size in X-axis (how many columns) ptrnSz = size of the movieclips containing the Sprites (how many columns) After that I just tween movieclips with TweenMax, calling them each with getChildByName.