iOS slow image pixel iterating - ios

I am trying to implement RGB histogram computation for images in Swift (I am new to iOS).
However the computation time for 1500x1000 image is about 66 sec, which I consider to be too slow.
Are there any ways to speed up image traversal?
P.S. current code is the following:
func calcHistogram(image: UIImage) {
let bins: Int = 20;
let width = Int(image.size.width);
let height = Int(image.size.height);
let binStep: Double = Double(bins-1)/255.0
var hist = Array(count:bins, repeatedValue:Array(count:bins, repeatedValue:Array(count:bins, repeatedValue:Int())))
for i in 0..<bins {
for j in 0..<bins {
for k in 0..<bins {
hist[i][j][k] = 0;
}
}
}
var pixelData = CGDataProviderCopyData(CGImageGetDataProvider(image.CGImage))
var data: UnsafePointer<UInt8> = CFDataGetBytePtr(pixelData)
for x in 0..<width {
for y in 0..<height {
var pixelInfo: Int = ((width * y) + x) * 4
var r = Double(data[pixelInfo])
var g = Double(data[pixelInfo+1])
var b = Double(data[pixelInfo+2])
let r_bin: Int = Int(floor(r*binStep));
let g_bin: Int = Int(floor(g*binStep));
let b_bin: Int = Int(floor(b*binStep));
hist[r_bin][g_bin][b_bin] += 1;
}
}
}

As noted in my comment on the question, there are some things you might rethink before you even try to optimize this code.
But even if you do move to a better overall solution like GPU-based histogramming, a library, or both... There are some Swift pitfalls you're falling into here that are good to talk about so you don't run into them elsewhere.
First, this code:
var hist = Array(count:bins, repeatedValue:Array(count:bins, repeatedValue:Array(count:bins, repeatedValue:Int())))
for i in 0..<bins {
for j in 0..<bins {
for k in 0..<bins {
hist[i][j][k] = 0;
}
}
}
... is initializing every member of your 3D array twice, with the same result. Int() produces a value of zero, so you could leave out the triple for loop. (And possibly change Int() to 0 in your innermost repeatedValue: parameter to make it more readable.)
Second, arrays in Swift are copy-on-write, but this optimization can break down in multidimensional arrays: changing an element of a nested array can cause the entire nested array to be rewritten instead of just the one element. Multiply that by the depth of nested arrays and number of element writes you have going on in a double for loop and... it's not pretty.
Unless there's a reason your bins need to be organized this way, I'd recommend finding a different data structure for them. Three separate arrays? One Int array where index i is red, i + 1 is green, and i + 2 is blue? One array of a custom struct you define that has separate r, g, and b members? See what conceptually fits with your tastes or the rest of your app, and profile to make sure it works well.
Finally, some Swift style points:
pixelInfo, r, g, and b in your second loop don't change. Use let, not var, and the optimizer will thank you.
Declaring and initializing something like let foo: Int = Int(whatever) is redundant. Some people like having all their variables/constants explicitly typed, but it does make your code a tad less readable and harder to refactor.
Int(floor(x)) is redundant — conversion to integer always takes the floor.

If you have some issues about performance in your code, first of all, use Time Profiler from Instruments. You can start it via Xcode menu Build->Profile, then, Instruments app opened, where you can choose Time Profiler.
Start recording and do all interactions in the your app.
Stop recording and analyse where is the "tightest" place of your code.
Also check options "Invert call tree", "Hide missing symbols" and "Hide system libraries" for better viewing profile results.
You can also double click at any listed function to view it in code and seeing percents of usage

Related

Dynamic Time Warping in Swift

I translated a DTW matlab function to Swift. The code looks as follows:
private func dtw(x1 : [Double], x2 : [Double]) -> Double {
let n1 = x1.count;
let n2 = x2.count;
var table = [[Double]](repeating: [Double](repeating: 0, count: n2 + 1), count: 2);
table[0][0] = 0;
for i in 1...n2 { table[0][i] = Double.infinity }
for i in 1 ... n1 {
table[1][0] = Double.infinity;
for j in 1 ... n2 {
let cost = abs(x1[i - 1] - x2[j - 1]);
var min = table[0][j - 1];
if (min > table[0][j]) {
min = table[0][j];
}
if (min > table[1][j - 1]) { min = table[1][j - 1]; }
table[1][j] = cost + min;
}
let swap = table[0];
table[0] = table[1];
table[1] = swap;
}
return table[0][n2];
}
This function takes an average of 16 ms to complete on an iPhone 11. For my use case, this is very slow. I want to investigate ways to improve speed. I recently read these two articles : DTW in Swift Orailly and Parallel programming with Swift. In the first article, there is a good quote:
Our implementation of DTW is naïve, and can be accelerated using parallel computing. To calculate the new row/column in a distance matrix, you don't need to wait until the previous one is finished; you only need it to be filled one cell ahead of your row/column
This would make the for j in 1 ... n2 { for loop an ideal candidate. ( I think ) Looking at the code, only these two operations should be thread-safe due to the read / write:
table[1][j - 1]
table[1][j]
The problem I am currently experiencing in introducing parallel computing ( from article 2 ) is that I cannot figure out how to tell swift run everything in parallel, except when I come to the two below lines, as they depend on their predocessor:
if (min > table[1][j - 1]) { min = table[1][j - 1]; }
table[1][j] = cost + min;
I suspect I could solve this issue with DispatchQueue.concurrentPerform and an NSLock(), if I implemented it correctly. ( I have not ) It could also be the wrong tool of choice, yielding me back to my question:
What can I do, to improve the speed of my DTW function where the only constraint in performing a task is that the previous execution in an array had to have completed ( parallelization, concurrency, etc. ) A code example would go a long way.
Your first problem is that you're creating an array of arrays. This is not an efficient data structure, and is not a "2 dimensional array" in the way most people mean (i.e a matrix). It is an array made up of other arrays, all of which can have arbitrary sizes, and this can be very expensive to mutate. As a rule, if you want a matrix, you should back it with a flat array and use multiplication to find its offsets, particularly if you're mutating it. Instead of table[i][j] you would use table[i * width + j].
But in your case it's even easier, since there are exactly two rows. So you don't a multi-dimensional array at all. You can just use two variables, and it'll be much more efficient. (In my tests, just making this change is about 30% faster than the original code.)
The major thing that slows you down is contention. You read and write to the same array in the loop. That gets in the way of various reordering and caching optimizations. In particular, it happens here:
if (min > table[1][j - 1]) { min = table[1][j - 1]; }
table[1][j] = cost + min;
If you rewrite that using two row variables rather than an array, it still looks like this:
if (min > row1[j - 1]) { min = row1[j - 1] }
row1[j] = cost + min
This forces the previous write to row1 to be fully completed before the next minimum can be computed, and then requires an array lookup to get the value back. But that's not really necessary. You can just cache the previous value between loops. Doing that means the loop only performs reads on row0 and only performs writes on row1. That's good for memory contention.
Putting those together, I wrote it this way. I changed the offsets to run from 0 rather than 1; it just made the code a little simpler to understand IMO. In my tests, this is about 3x faster than the original code for two arrays of 10k elements each.
func dtw(x1 : [Double], x2 : [Double]) -> Double {
let n1 = x1.count
let n2 = x2.count
var row0 = Array(repeating: Double.infinity, count: n2 + 1)
row0[0] = 0
var row1 = Array(repeating: 0.0, count: n2 + 1)
for i in 0 ..< n1 {
row1[0] = .infinity
// Keep track of the last value so we never have to read from row1.
var lastValue = Double.infinity
for j in 0 ..< n2 {
let cost = abs(x1[i] - x2[j])
// Don't be tempted to use the 3-value version of `min` here. It's much slower.
var minimum = min(row0[j], row0[j + 1])
minimum = min(minimum, lastValue)
lastValue = cost + minimum
row1[j + 1] = lastValue
}
swap(&row0, &row1)
}
return row0[n2];
}
This code is somewhat hard to make parallel, because the operations are not independent. Each row depends on the other rows. The key to good queue-based parallelism is the ability to split up fairly large chunks of independent work, and then efficiently combine them at the end. The cost of coordination will eat your benefits if the work units are too small. In many cases, vectorization (SIMD) is much more efficient than dispatching to multiple queues.
The cost function is independent, and I explored computing it with Accelerate (the main vectorization framework), but this generally made things slower. The compiler is very good at optimizing simple math in loops, and will do quite a lot of vectorizing for you if you let it. Accelerate is best when you need to do an expensive, consistent, and independent computation on a lot of values. And this loop isn't expensive or independent.

16 bit logic/computer simulation in Swift

I’m trying to make a basic simulation of a 16 bit computer with Swift. The computer will feature
An ALU
2 registers
That’s all. I have enough knowledge to create these parts visually and understand how they work, but it has become increasingly difficult to make larger components with more inputs while using my current approach.
My current approach has been to wrap each component in a struct. This worked early on, but is becoming increasingly difficult to manage multiple inputs while staying true to the principles of computer science.
The primary issue is that the components aren’t updating with the clock signal. I have the output of the component updating when get is called on the output variable, c. This, however, neglects the idea of a clock signal and will likely cause further problems later on.
It’s also difficult to make getters and setters for each variable without getting errors about mutability. Although I have worked through these errors, they are annoying and slow down the development process.
The last big issue is updating the output. The output doesn’t update when the inputs change; it updates when told to do so. This isn’t accurate to the qualities of real computers and is a fundamental error.
This is an example. It is the ALU I mentioned earlier. It takes two 16 bit inputs and outputs 16 bits. It has two unary ALUs, which can make a 16 bit number zero, negate it, or both. Lastly, it either adds or does a bit wise and comparison based on the f flag and inverts the output if the no flag is selected.
struct ALU {
//Operations are done in the order listed. For example, if zx and nx are 1, it first makes input 1 zero and then inverts it.
var x : [Int] //Input 1
var y : [Int] //Input 2
var zx : Int //Make input 1 zero
var zy : Int //Make input 2 zero
var nx : Int //Invert input 1
var ny : Int //Invert input 2
var f : Int //If 0, do a bitwise AND operation. If 1, add the inputs
var no : Int //Invert the output
public var c : [Int] { //Output
get {
//Numbers first go through unary ALUs. These can negate the input (and output the value), return 0, or return the inverse of 0. They then undergo the operation specified by f, either addition or a bitwise and operation, and are negated if n is 1.
var ux = UnaryALU(z: zx, n: nx, x: x).c //Unary ALU. See comments for more
var uy = UnaryALU(z: zy, n: ny, x: y).c
var fd = select16(s: f, d1: Add16(a: ux, b: uy).c, d0: and16(a: ux, b: uy).c).c //Adds a 16 bit number or does a bitwise and operation. For more on select16, see the line below.
var out = select16(s: no, d1: not16(a: fd).c, d0: fd).c //Selects a number. If s is 1, it returns d1. If s is 0, it returns d0. d0 is the value returned by fd, while d1 is the inverse.
return out
}
}
public init(x:[Int],y:[Int],zx:Int,zy:Int,nx:Int,ny:Int,f:Int,no:Int) {
self.x = x
self.y = y
self.zx = zx
self.zy = zy
self.nx = nx
self.ny = ny
self.f = f
self.no = no
}
}
I use c for the output variable, store values with multiple bits in Int arrays, and store single bits in Int values.
I’m doing this on Swift Playgrounds 3.0 with Swift 5.0 on a 6th generation iPad. I’m storing each component or set of components in a separate file in a module, which is why some variables and all structs are marked public. I would greatly appreciate any help. Thanks in advance.
So, I’ve completely redone my approach and have found a way to bypass the issues I was facing. What I’ve done is make what I call “tracker variables” for each input. When get is called for each variable, it returns that value of the tracker assigned to it. When set is called it calls an update() function that updates the output of the circuit. It also updates the value of the tracker. This essentially creates a ‘copy’ of each variable. I did this to prevent any infinite loops.
Trackers are unfortunately necessary here. I’ll demonstrate why
var variable : Type {
get {
return variable //Calls the getter again, resulting in an infinite loop
}
set {
//Do something
}
}
In order to make a setter, Swift requires a getter to be made as well. In this example, calling variable simply calls get again, resulting in a never-ending cascade of calls to get. Tracker variables are a workaround that use minimal extra code.
Using an update method makes sure the output responds to a change in any input. This also works with a clock signal, due to the architecture of the components themselves. Although it appears to act as the clock, it does not.
For example, in data flip-flops, the clock signal is passed into gates. All a clock signal does is deactivate a component when the signal is off. So, I can implement that within update() while remaining faithful to reality.
Here’s an example of a half adder. Note that the tracker variables I mentioned are marked by an underscore in front of their name. It has two inputs, x and y, which are 1 bit each. It also has two outputs, high and low, also known as carry and sum. The outputs are also one bit.
struct halfAdder {
private var _x : Bool //Tracker for x
public var x: Bool { //Input 1
get {
return _x //Return the tracker’s value
}
set {
_x = x //Set the tracker to x
update() //Update the output
}
}
private var _y : Bool //Tracker for y
public var y: Bool { //Input 2
get {
return _y
}
set {
_y = y
update()
}
}
public var high : Bool //High output, or ‘carry’
public var low : Bool //Low output, or ‘sum’
internal mutating func update(){ //Updates the output
high = x && y //AND gate, sets the high output
low = (x || y) && !(x && y) //XOR gate, sets the low output
}
public init(x:Bool, y:Bool){ //Initializer
self.high = false //This will change when the variables are set, ensuring a correct output.
self.low = false //See above
self._x = x //Setting trackers and variables
self._y = y
self.x = x
self.y = y
}
}
This is a very clean way, save for the trackers, do accomplish this task. It can trivially be expanded to fit any number of bits by using arrays of Bool instead of a single value. It respects the clock signal, updates the output when the inputs change, and is very similar to real computers.

Getting data from a table

Using Tiled I generated a Lua file which contains a table. So I figured that I'd write a for loop which cycles through the table gets the tile id and checks if collision is true and add collision if it was. But, I've been unable to get the tile id's or check they're properties. But it returned a error saying that I tried to index nil value tileData.
Here is the Map file
return {
version = "1.1",
luaversion = "5.1",
-- more misc. data
tilesets = {
{
name = "Tileset1",
firstgid = 1,
tilewidth = 16,
tileheight = 16,
tiles = {
{
id = 0,
properties = {
["Collision"] = false
}
},
}
}
layers = {
{
type = "tilelayer",
name = "Tile Layer 1"
data = {
-- array of tile id's
}
}
}
}
And here is the for loop I wrote to cycle through the table
require("Protyping")
local map = love.filesystem.load("Protyping.lua")()
local tileset1 = map.tilesets
local tileData = tileset1.tiles
local colision_layer = map.layers[1].data
for y=1,16 do
for x=1,16 do
if tileData[colision_layer[x*y]].properties["Colision"] == true then
world:add("collider "..x*y,x*map.tilewidth, y*tileheight,tilewidth,tileheight)
end
end
end
Try this:
tileset1 = map.tilesets[1]
instead of
tileset1 = map.tilesets
lhf's answer (map.tilesets[1] instead of map.tilesets) fixes the error you were getting, but there are at least two other things you'll need to fix for your code to work.
The first is consistent spelling: you have a Collision property in your map data and a Colision check in your code.
The second thing you'll need to fix is the way that the individual tiles are being referenced. Tiled's layer data is made of 2-dimensional tile data laid out in a 1-dimensional array from left-to-right, starting at the top, so the index numbers look like this:
You would think you could just do x * y to get the index, but if you look closely, you'll see that this doesn't work. Instead, you have to do x + (y - 1) * width.
Or if you use zero-based x and y, it looks like this:
Personally, I prefer 0-based x and y (but as I get more comfortable with Lua, that may change, as Lua has 1-based arrays). If you do go with 0-based x and y, then the formula is x + 1 + y * width.
I happen to have just written a tutorial this morning that goes over the Tiled format and has some helper functions that do exactly this (using the 0-based formula). You may find it helpful: https://github.com/prust/sti-pg-example.
The tutorial uses Simple Tiled Implementation, which is a very nice library for working with Tiled lua files. Since you're trying to do collision, I should mention that STI has a plugins for both the bump collision library and the box2d (physics) collision library.

How to add Tuples and apply a ceiling/clamp function in F#

So I am working on a project using F# for some SVG line manipulations.
I thought it would be good to represent color an RGB value as a tuple (R,G,B). It just made sense to me. Well since my project involves generating SVG lines in a loop. I decided to have a color offset, conveniently also represented in a tuple (Roffset, Goffset, Boffset)
An offset in this case represents how much each line differs from the previous.
I got to a point where I needed to add the tuples. I thought since they were of the same dimensions and types, it would be fine. But apparently not. I also checked the MSDN on tuples, but I did not find anything about how to add them or combine them.
Here is what I tried. Bear in mind I tried to omit as much irrelevant code as possible since this is a long class definition with LOTS of members.
type lineSet ( 10+ params omitted ,count, colorOff :byte*byte*byte, color :byte*byte*byte ,strokeWid , strokeWidthOff ) =
member val Color = color with get, set
member val ColorOffset = colorOff with get, set
member val lineCount = count with get, set
interface DrawingInterfaces.IRepresentable_SVG with
member __.getSVGRepresenation() =
let mutable currentColor = __.Color
for i in 1..__.lineCount do
currentColor <- currentColor + __.ColorOffset
That last line of code is what I wanted to do. However, it appears you cannot add tuples directly.
I also need a way to clamp the result so it cannot go over 255, but I suspect a simple try with block will do the trick. OR I could let the params take a type int*int*int and just use an if to reset it back to 255 each time.
As I mentioned in the comments, the clamping function in your code does not actually work - you need to convert the numbers to integers before doing the addition (and then you can check if the integer is greater than 255). You can do something like this:
let addClamp (a:byte) (b:byte) =
let r = int a + int b
if r > 255 then 255uy else byte r
Also, if you work with colors, then it might make sense to define a custom color type rather than passing colors around as tuples. That way, you can also define + on colors (with clamping) and it will make your code simpler (but still, 10 constructor arguments is a bit scary, so I'd try to think if there is a way to simplify that a bit). A color type might look like this:
type Color(r:byte, g:byte, b:byte) =
static let addClamp (a:byte) (b:byte) =
let r = int a + int b
if r > 255 then 255uy else byte r
member x.R = r
member x.B = b
member x.G = g
static member (+) (c1:Color, c2:Color) =
Color(addClamp c1.R c2.R, addClamp c1.G c2.G,addClamp c1.B c2.B)
Using the type, you can then add colors pretty easily and do not have to add clamping each time you need to do that. For example:
Color(255uy, 0uy, 0uy) + Color(1uy, 0uy, 0uy)
But I still think you could make the code more readable and more composable by refactoring some of the visual properties (like stroke & color) to a separate type and then just pass that to LineSet. This way you won't have 10+ parameters to a constructor and your code will probably be more flexible too.
Here is a modified version of your code which I think is a bit nicer
let add3DbyteTuples (tuple1:byte*byte*byte , tuple2:byte*byte*byte) =
let inline intify (a,b,c) = int a,int b,int c
let inline tripleadd (a,b,c) (d,e,f) = a+d,b+e,c+f
let clamp a = if a > 255 then 255 else a
let R,G,B = tripleadd (intify tuple1) (intify tuple2)
clamp R,clamp G,clamp B

F# lazy pixels reading

I want to make a lazy loading of image pixels to the 3 dimensional array of integers.
For example in simple way it looks like this:
for i=0 to Width
for j=0 to Height
let point=image.GetPixel(i,j)
pixels.[0,i,j] <- point.R
pixels.[1,i,j] <- point.G
pixels.[2,i,j] <- point.B
How it can be made in lazy way?
What would be slow is the call to GetPixel. If you want to call it only as needed, you could use something like this:
open System.Drawing
let lazyPixels (image:Bitmap) =
let Width = image.Width
let Height = image.Height
let pixels : Lazy<byte>[,,] = Array3D.zeroCreate 3 Width Height
for i = 0 to Width-1 do
for j = 0 to Height-1 do
let point = lazy image.GetPixel(i,j)
pixels.[0,i,j] <- lazy point.Value.R
pixels.[1,i,j] <- lazy point.Value.G
pixels.[2,i,j] <- lazy point.Value.B
pixels
GetPixel will be called at most once for every pixel, and then reused for the other components.
Another way of approaching this problem would be to do a bulk-load of the entire image. This will be a lot quicker than calling GetPixel over and over again.
open System.Drawing
open System.Drawing.Imaging
let pixels (image:Bitmap) =
let Width = image.Width
let Height = image.Height
let rect = new Rectangle(0,0,Width,Height)
// Lock the image for access
let data = image.LockBits(rect, ImageLockMode.ReadOnly, image.PixelFormat)
// Copy the data
let ptr = data.Scan0
let stride = data.Stride
let bytes = stride * data.Height
let values : byte[] = Array.zeroCreate bytes
System.Runtime.InteropServices.Marshal.Copy(ptr,values,0,bytes)
// Unlock the image
image.UnlockBits(data)
let pixelSize = 4 // <-- calculate this from the PixelFormat
// Create and return a 3D-array with the copied data
Array3D.init 3 Width Height (fun i x y ->
values.[stride * y + x * pixelSize + i])
(adopted from the C# sample on Bitmap.LockBits)
What do you mean by lazy?
An array is not a lazy data type, which means that if you want to use arrays, you need to load all pixels during the initialization. If we were using single-dimensional array, an alternative would be to use seq<_> which is lazy (but you can access elements only sequentially). There is nothing like seq<_> for multi-dimensional arrays, so you'll need to use something else.
Probably the closest option would be to use three-dimensional array of lazy values (Lazy<int>[,,]). This is an array of delayed thunks that access pixels and are evaluated only when you actually read the value at the location. You could initialize it like this:
for i=0 to Width
for j=0 to Height
let point = lazy image.GetPixel(i,j)
pixels.[0,i,j] <- lazy point.Value.R
pixels.[1,i,j] <- lazy point.Value.G
pixels.[2,i,j] <- lazy point.Value.B
The snippet creates a lazy value that reads the pixel (point) and then three lazy values to get the individual color components. When accessing color component, the point value is evaluated (by accessing Value).
The only difference in the rest of your code is that you'll need to call Value (e.g. pixels.[0,10,10].Value to get the actual color component of the pixel.
You could define more complex data structures (such as your own type that supports indexing and is lazy), but I think that array of lazy values should be a good starting point.
As mentioned already by other comments that you can use the lazy pixel loading in the 3D array but that would just make the GetPixel operation lazy and not the memory allocation of the 3D array as the array is allocated already when you call create method of Array3D.
If you want to make the memory allocation as well as GetPixel lazy then you can use sequences as shown by below code:
let getPixels (bmp:Bitmap) =
seq {
for i = 0 to bmp.Height-1 do
yield seq {
for j = 0 to bmp.Width-1 do
let pixel = bmp.GetPixel(j,i)
yield (pixel.R,pixel.G,pixel.B)
}
}

Resources