I have an app where I'm taking 2 UIImage instances as input with the goal of providing as output a percentage value indicating how different (or similar). Is there anything in UIKit or Core Graphics that I can use to do this? For example, 100% would indicate a perfect match.
Here's my input data:
]1
]2
I would expect less than 100% for the above, since they are clearly different.
Also open to 3rd party suggestions.
A very very simple way to achive this is to iterate over both images and comapre each pixel. With the width and the height of the pixel you'll be able to get the difference of the pixels as an percentage value.
Here is an example implementation in swift that shows how to iterate over the pixels of an image. It also shows how to get the R, G and B values for each pixel which (so I thnik) should be the base for the comparison:
import Foundation
import QuartzCore
import AppKit
let imagePath : NSURL? = NSURL( fileURLWithPath: "/Users/fabi/Desktop/Test.JPG" );
if imagePath != nil {
var image = CIImage( contentsOfURL: imagePath )
var imageProperties : NSDictionary = image.properties()
var imageWidth : NSNumber? = imageProperties.valueForKey( "PixelHeight" ) as? NSNumber
var imageHeight : NSNumber? = imageProperties.valueForKey( "PixelWidth" ) as? NSNumber
println( imageWidth?.integerValue )
println( imageHeight?.integerValue )
var bitmapImage : NSBitmapImageRep = NSBitmapImageRep( CIImage: image )
for var w = 0; w <= imageHeight?.integerValue; ++w {
for var h = 0; h <= imageWidth?.integerValue; ++h {
var pixelColor : NSColor? = bitmapImage.colorAtX( w, y: h )
println( "R: " + pixelColor?.redComponent );
println( "G: " + pixelColor?.greenComponent );
println( "B: " + pixelColor?.blueComponent );
}
}
}
Related
I'm encountering a big problem when using the number 0 (zero) as a factor for the colors to generate scales, the numbers close to 0 (zero) end up becoming almost white, impossible to see a difference.
The idea is that above 0 (zero) it starts green and gets even stronger and below 0 (zero) starting with a red one and getting stronger.
I really need any number, even if it's 0.000001 already has a visible green and the -0.000001 has a visible red.
Link to SpreadSheet:
https://docs.google.com/spreadsheets/d/1uN5rDEeR10m3EFw29vM_nVXGMqhLcNilYrFOQfcC97s/edit?usp=sharing
Note to help with image translation and visualization:
Número = Number
Nenhum = None
Valor Máx. = Max Value
Valor Min. = Min Value
Current Result / Expected Result
After reading your new comments I understand that these are the requisites:
The values above zero should be green (with increased intensity the further beyond zero).
The values below zero should be red (with increased intensity the further beyond zero).
Values near zero should be coloured (not almost white).
Given those requisites, I developed an Apps Script project that would be useful in your scenario. This is the full project:
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu("Extra").addItem("Generate gradient", "parseData").addToUi();
}
function parseData() {
var darkestGreen = "#009000";
var lighestGreen = "#B8F4B8";
var darkestRed = "#893F45";
var lighestRed = "#FEBFC4";
var range = SpreadsheetApp.getActiveRange();
var data = range.getValues();
var biggestPositive = Math.max.apply(null, data);
var biggestNegative = Math.min.apply(null, data);
var greenPalette = colourPalette(darkestGreen, lighestGreen, biggestPositive);
var redPalette = colourPalette(darkestRed, lighestRed, Math.abs(
biggestNegative) + 1);
var fullPalette = [];
for (var i = 0; i < data.length; i++) {
if (data[i] > 0) {
var cellColour = [];
cellColour[0] = greenPalette[data[i] - 1];
fullPalette.push(cellColour);
} else if (data[i] < 0) {
var cellColour = [];
cellColour[0] = redPalette[Math.abs(data[i]) - 1];
fullPalette.push(cellColour);
} else if (data[i] == 0) {
var cellColour = [];
cellColour[0] = null;
fullPalette.push(cellColour);
}
}
range.setBackgrounds(fullPalette);
}
function colourPalette(darkestColour, lightestColour, colourSteps) {
var firstColour = hexToRGB(darkestColour);
var lastColour = hexToRGB(lightestColour);
var blending = 0.0;
var gradientColours = [];
for (i = 0; i < colourSteps; i++) {
var colour = [];
blending += (1.0 / colourSteps);
colour[0] = firstColour[0] * blending + (1 - blending) * lastColour[0];
colour[1] = firstColour[1] * blending + (1 - blending) * lastColour[1];
colour[2] = firstColour[2] * blending + (1 - blending) * lastColour[2];
gradientColours.push(rgbToHex(colour));
}
return gradientColours;
}
function hexToRGB(hex) {
var colour = [];
colour[0] = parseInt((removeNumeralSymbol(hex)).substring(0, 2), 16);
colour[1] = parseInt((removeNumeralSymbol(hex)).substring(2, 4), 16);
colour[2] = parseInt((removeNumeralSymbol(hex)).substring(4, 6), 16);
return colour;
}
function removeNumeralSymbol(hex) {
return (hex.charAt(0) == '#') ? hex.substring(1, 7) : hex
}
function rgbToHex(rgb) {
return "#" + hex(rgb[0]) + hex(rgb[1]) + hex(rgb[2]);
}
function hex(c) {
var pool = "0123456789abcdef";
var integer = parseInt(c);
if (integer == 0 || isNaN(c)) {
return "00";
}
integer = Math.round(Math.min(Math.max(0, integer), 255));
return pool.charAt((integer - integer % 16) / 16) + pool.charAt(integer % 16);
}
First of all the script will use the Ui class to show a customised menu called Extra. That menu calls the main function parseData, that reads the whole selection data with getValues. That function holds the darkest/lightest green/red colours. I used some colours for my example, but I advise you to edit them as you wish. Based on those colours, the function colourPalette will use graphical linear interpolation between the two colours (lightest and darkest). That interpolation will return an array with colours from darkest to lightest, with as many in-betweens as the maximum integer in the column. Please notice how the function uses many minimal functions to run repetitive tasks (converting from hexadecimal to RGB, formatting, etc…). When the palette is ready, the main function will create an array with all the used colours (meaning that it will skip unused colours, to give sharp contrast between big and small numbers). Finally, it will apply the palette using the setBackgrounds method. Here you can see some sample results:
In that picture you can see one set of colours per column. Varying between random small and big numbers, numerical series and mixed small/big numbers. Please feel free to ask any doubt about this approach.
A very small improvement to acques-Guzel Heron
I made it skip all non numeric values, beforehand it just errored out.
I added an option in the menu to use a custom range.
Thank you very much acques-Guzel Heron
function onOpen() {
const ui = SpreadsheetApp.getUi();
ui.createMenu('Extra')
.addItem('Generate gradient', 'parseData')
.addItem('Custom Range', 'customRange')
.addToUi();
}
function parseData(customRange = null) {
const darkestGreen = '#009000';
const lighestGreen = '#B8F4B8';
const darkestRed = '#893F45';
const lighestRed = '#FEBFC4';
let range = SpreadsheetApp.getActiveRange();
if (customRange) {
range = SpreadsheetApp.getActiveSpreadsheet().getRange(customRange);
}
const data = range.getValues();
const biggestPositive = Math.max.apply(null, data.filter(a => !isNaN([a])));
const biggestNegative = Math.min.apply(null, data.filter(a => !isNaN([a])));
const greenPalette = colorPalette(darkestGreen, lighestGreen, biggestPositive);
const redPalette = colorPalette(darkestRed, lighestRed, Math.abs(biggestNegative) + 1);
const fullPalette = [];
for (const datum of data) {
if (datum > 0) {
fullPalette.push([greenPalette[datum - 1]]);
} else if (datum < 0) {
fullPalette.push([redPalette[Math.abs(datum) - 1]]);
} else if (datum == 0 || isNaN(datum)) {
fullPalette.push(['#ffffff']);
}
}
range.setBackgrounds(fullPalette);
}
function customRange() {
const ui = SpreadsheetApp.getUi();
result = ui.prompt("Please enter a range");
parseData(result.getResponseText());
}
function colorPalette(darkestColor, lightestColor, colorSteps) {
const firstColor = hexToRGB(darkestColor);
const lastColor = hexToRGB(lightestColor);
let blending = 0;
const gradientColors = [];
for (i = 0; i < colorSteps; i++) {
const color = [];
blending += (1 / colorSteps);
color[0] = firstColor[0] * blending + (1 - blending) * lastColor[0];
color[1] = firstColor[1] * blending + (1 - blending) * lastColor[1];
color[2] = firstColor[2] * blending + (1 - blending) * lastColor[2];
gradientColors.push(rgbToHex(color));
}
return gradientColors;
}
function hexToRGB(hex) {
const color = [];
color[0] = Number.parseInt((removeNumeralSymbol(hex)).slice(0, 2), 16);
color[1] = Number.parseInt((removeNumeralSymbol(hex)).slice(2, 4), 16);
color[2] = Number.parseInt((removeNumeralSymbol(hex)).slice(4, 6), 16);
return color;
}
function removeNumeralSymbol(hex) {
return (hex.charAt(0) == '#') ? hex.slice(1, 7) : hex;
}
function rgbToHex(rgb) {
return '#' + hex(rgb[0]) + hex(rgb[1]) + hex(rgb[2]);
}
function hex(c) {
const pool = '0123456789abcdef';
let integer = Number.parseInt(c, 10);
if (integer === 0 || isNaN(c)) {
return '00';
}
integer = Math.round(Math.min(Math.max(0, integer), 255));
return pool.charAt((integer - integer % 16) / 16) + pool.charAt(integer % 16);
}
EDIT: Resolved, I answered the question below.
I am using the following to get metadata for PHAssets:
let data = NSData.init(contentsOf: url!)!
if let imageSource = CGImageSourceCreateWithData(data, nil) {
let metadata = CGImageSourceCopyPropertiesAtIndex(imageSource, 0, nil)! as NSDictionary
}
The metadata dictionary has all the values I am looking for. However a few fields like ShutterSpeedValue, ExposureTime which have fractions get printed as decimals:
ExposureTime = "0.05"
ShutterSpeedValue = "4.321956769055745"
When I look at this data on my Mac's preview app and exiftool, it shows:
ExposureTime = 1/20
ShutterSpeedValue = 1/20
How can I get the correct fraction string instead of the decimal string?
EDIT: I tried simply converting the decimal to a fraction string using this from SO code but this isn't correct:
func rationalApproximation(of x0 : Double, withPrecision eps : Double = 1.0E-6) -> String {
var x = x0
var a = x.rounded(.down)
var (h1, k1, h, k) = (1, 0, Int(a), 1)
while x - a > eps * Double(k) * Double(k) {
x = 1.0/(x - a)
a = x.rounded(.down)
(h1, k1, h, k) = (h, k, h1 + Int(a) * h, k1 + Int(a) * k)
}
return "\(h)/\(k)"
}
As you notice, the decimal value of ShutterSpeedValue printed as 4.321956769055745 isn't even equal to 1/20.
Resolved.
As per
https://www.dpreview.com/forums/post/54376235
ShutterSpeedValue is defined as APEX value, where:
ShutterSpeed = -log2(ExposureTime)
So -log2(1/20) is 4.3219, just as what I observed.
So to get the ShutterSpeedValue, I use the following:
"1/\(ceil(pow(2, Double(4.321956769055745))))"
I tested 3 different photos and 1/20, 1/15 and 1/1919 were all correctly calculated using your formula.
I'm writing some code to render camera preview using SkiaSharp. This is cross-platform but I came across a problem while writing the implementation for android.
I needed to convert YUV_420_888 to RGB8888 because that's what SkiaSharp supports and with the help of this thread, somehow managed to show decent quality images to my SkiaSharp canvas. The problem is the speed. At best I can get about 8 fps but usually it's just 4 or 5 fps. It turned out the biggest factor is the conversion. I now have about 3 versions of my ToRGB converter. I've even ended up trying "unsafe" code and parallel loops. I'll just show you my best one yet.
private unsafe byte[] ToRgb(byte[] yValuesArr, byte[] uValuesArr,
byte[] vValuesArr, int uvPixelStride, int uvRowStride)
{
var width = PixelSize.Width;
var height = PixelSize.Height;
var rgb = new byte[width * height * 4];
var partitions = Partitioner.Create(0, height);
Parallel.ForEach(partitions, range =>
{
var (item1, item2) = range;
Parallel.For(item1, item2, y =>
{
for (var x = 0; x < width; x++)
{
var yIndex = x + width * y;
var currentPosition = yIndex * 4;
var uvIndex = uvPixelStride * (x / 2) + uvRowStride * (y / 2);
fixed (byte* rgbFixed = rgb)
fixed (byte* yValuesFixed = yValuesArr)
fixed (byte* uValuesFixed = uValuesArr)
fixed (byte* vValuesFixed = vValuesArr)
{
var rgbPtr = rgbFixed;
var yValues = yValuesFixed;
var uValues = uValuesFixed;
var vValues = vValuesFixed;
var yy = *(yValues + yIndex);
var uu = *(uValues + uvIndex);
var vv = *(vValues + uvIndex);
var rTmp = yy + vv * 1436 / 1024 - 179;
var gTmp = yy - uu * 46549 / 131072 + 44 - vv * 93604 / 131072 + 91;
var bTmp = yy + uu * 1814 / 1024 - 227;
rgbPtr = rgbPtr + currentPosition;
*rgbPtr = (byte) (rTmp < 0 ? 0 : rTmp > 255 ? 255 : rTmp);
rgbPtr++;
*rgbPtr = (byte) (gTmp < 0 ? 0 : gTmp > 255 ? 255 : gTmp);
rgbPtr++;
*rgbPtr = (byte) (bTmp < 0 ? 0 : bTmp > 255 ? 255 : bTmp);
rgbPtr++;
*rgbPtr = 255;
}
}
});
});
return rgb;
}
You can also find it on my repo. You can also find on that same repo the part where I rendered the output to SkiaSharp
For a preview size of 1440x1080, running on my phone, this code takes about 120ms to finish. Even if all the other parts are optimized, the most I can get from that is 8fps. And no, it's not my hardware because the built-in camera app runs smoothly. By the way 1440x1080 is the output of my ChooseOptimalSize algorithm that I got from the mono-droid examples of android's Camera2 API. I don't know if it's the best way or if it lacks logic on detecting the fps and sizing down the preview to make it faster.
Does SkiaSharp support GPU drawing? If you connect the camera to a SurfaceTexture, you can use the preview frames as GL textures and render them efficiently into an OpenGL scene.
Even if not, you may still get faster results by sending the frames to the GPU and reading them back to the CPU with something like glReadPixels, as that'll do a RGB conversion within the GPU.
ok so the reason for this question is that i am trying to deal with multiple konva shapes at a time. in the original project the shapes are being selected by drawing a momentary rectangle around the shapes that you want selected (rectangular selection). I have seen some of the other post about this, but they only seem to deal with the selection itself, i have that working.
Here is a codepen example that illustrates the problem.
link
Instructions:
click the select button to have the two shapes put in a group and a transformer applied
Rotate and scale the selected shapes.
click the deselect button to have the shapes moved back onto the layer.
The parts that is interresting is after line 92, where i am exploring different methods of moving the shapes back onto the layer.
children.toArray().forEach(e => {
// Need to apply transformations correctly before putting back on layer
//Method 1
if (method === 1) {
let newTransforms = e.getAbsoluteTransform();
let localTransforms = e.getTransform();
let m = newTransforms.getMatrix();
let matrices = getMatrix(e);
console.log("matrix before : ");
console.log(matrices);
e.rotation(selectionGroupRotation);
e.skew({ x: m[1], y: m[2] });
e.scale({ x: m[0], y: m[3] });
e.position({ x: m[4], y: m[5] })
m = newTransforms.getMatrix();
matrices = getMatrix(e);
console.log("matrix after : ");
// console.log(m);
console.log(matrices);
}
//Method 2
if (method === 2) {
let groupPos = selectionGroup.position();
let point = { x: groupPos.x, y: groupPos.y };
let groupScale = selectionGroup.scale();
let groupRotation = selectionGroup.rotation();
let configGroupMatrix = selectionGroup.getTransform();
let newpos = configGroupMatrix.point(point);
e.rotation(selectionGroupRotation + e.rotation());
e.scaleX(groupScale.x * e.scaleX());
e.scaleY(groupScale.y * e.scaleY());
let finalpos = {
x: groupPos.x + e.x(),
y: groupPos.y + e.y()
}
e.x(finalpos.x);
e.y(finalpos.y);
}
e.moveTo(layer);
})
The frustrating part is that the function getAbsoluteTransform() seem to give a transformed matrix, but you can't set the transformation matrix of a shape directly. But the solution might be as simple as setting the shapes matrix to the one returned from getAbsoluteTransform()
Currently, there are no methods to in Konva core to calculate attributes from the matrix. But you can easily find them online.
https://math.stackexchange.com/questions/13150/extracting-rotation-scale-values-from-2d-transformation-matrix
extract rotation, scale values from 2d transformation matrix
From the answers, I made this function to get attrs:
function decompose(mat) {
var a = mat[0];
var b = mat[1];
var c = mat[2];
var d = mat[3];
var e = mat[4];
var f = mat[5];
var delta = a * d - b * c;
let result = {
x: e,
y: f,
rotation: 0,
scaleX: 0,
scaleY: 0,
skewX: 0,
skewY: 0,
};
// Apply the QR-like decomposition.
if (a != 0 || b != 0) {
var r = Math.sqrt(a * a + b * b);
result.rotation = b > 0 ? Math.acos(a / r) : -Math.acos(a / r);
result.scaleX = r;
result.scaleY = delta / r;
result.skewX = Math.atan((a * c + b * d) / (r * r));
result.scleY = 0;
} else if (c != 0 || d != 0) {
var s = Math.sqrt(c * c + d * d);
result.rotation =
Math.PI / 2 - (d > 0 ? Math.acos(-c / s) : -Math.acos(c / s));
result.scaleX = delta / s
result.scaleY = s;
result.skewX = 0
result.skewY = Math.atan((a * c + b * d) / (s * s));
} else {
// a = b = c = d = 0
}
result.rotation *= 180 / Math.PI;
return result;
}
Then you can use that function to calculate attributes from the absolute transform.
Demo: https://codepen.io/lavrton/pen/dwGPBz?editors=1010
Based on #Kametrixom answer, I have made some test application for parallel calculation of sum in an array.
My test application looks like this:
import UIKit
import Metal
class ViewController: UIViewController {
// Data type, has to be the same as in the shader
typealias DataType = CInt
override func viewDidLoad() {
super.viewDidLoad()
let data = (0..<10000000).map{ _ in DataType(200) } // Our data, randomly generated
var start, end : UInt64
var result:DataType = 0
start = mach_absolute_time()
data.withUnsafeBufferPointer { buffer in
for elem in buffer {
result += elem
}
}
end = mach_absolute_time()
print("CPU result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
result = 0
start = mach_absolute_time()
result = sumParallel4(data)
end = mach_absolute_time()
print("Metal result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
result = 0
start = mach_absolute_time()
result = sumParralel(data)
end = mach_absolute_time()
print("Metal result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
result = 0
start = mach_absolute_time()
result = sumParallel3(data)
end = mach_absolute_time()
print("Metal result: \(result), time: \(Double(end - start) / Double(NSEC_PER_SEC))")
}
func sumParralel(data : Array<DataType>) -> DataType {
let count = data.count
let elementsPerSum: Int = Int(sqrt(Double(count)))
let device = MTLCreateSystemDefaultDevice()!
let parsum = device.newDefaultLibrary()!.newFunctionWithName("parsum")!
let pipeline = try! device.newComputePipelineStateWithFunction(parsum)
var dataCount = CUnsignedInt(count)
var elementsPerSumC = CUnsignedInt(elementsPerSum)
let resultsCount = (count + elementsPerSum - 1) / elementsPerSum // Number of individual results = count / elementsPerSum (rounded up)
let dataBuffer = device.newBufferWithBytes(data, length: strideof(DataType) * count, options: []) // Our data in a buffer (copied)
let resultsBuffer = device.newBufferWithLength(strideof(DataType) * resultsCount, options: []) // A buffer for individual results (zero initialized)
let results = UnsafeBufferPointer<DataType>(start: UnsafePointer(resultsBuffer.contents()), count: resultsCount) // Our results in convenient form to compute the actual result later
let queue = device.newCommandQueue()
let cmds = queue.commandBuffer()
let encoder = cmds.computeCommandEncoder()
encoder.setComputePipelineState(pipeline)
encoder.setBuffer(dataBuffer, offset: 0, atIndex: 0)
encoder.setBytes(&dataCount, length: sizeofValue(dataCount), atIndex: 1)
encoder.setBuffer(resultsBuffer, offset: 0, atIndex: 2)
encoder.setBytes(&elementsPerSumC, length: sizeofValue(elementsPerSumC), atIndex: 3)
// We have to calculate the sum `resultCount` times => amount of threadgroups is `resultsCount` / `threadExecutionWidth` (rounded up) because each threadgroup will process `threadExecutionWidth` threads
let threadgroupsPerGrid = MTLSize(width: (resultsCount + pipeline.threadExecutionWidth - 1) / pipeline.threadExecutionWidth, height: 1, depth: 1)
// Here we set that each threadgroup should process `threadExecutionWidth` threads, the only important thing for performance is that this number is a multiple of `threadExecutionWidth` (here 1 times)
let threadsPerThreadgroup = MTLSize(width: pipeline.threadExecutionWidth, height: 1, depth: 1)
encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
encoder.endEncoding()
var result : DataType = 0
cmds.commit()
cmds.waitUntilCompleted()
for elem in results {
result += elem
}
return result
}
func sumParralel1(data : Array<DataType>) -> UnsafeBufferPointer<DataType> {
let count = data.count
let elementsPerSum: Int = Int(sqrt(Double(count)))
let device = MTLCreateSystemDefaultDevice()!
let parsum = device.newDefaultLibrary()!.newFunctionWithName("parsum")!
let pipeline = try! device.newComputePipelineStateWithFunction(parsum)
var dataCount = CUnsignedInt(count)
var elementsPerSumC = CUnsignedInt(elementsPerSum)
let resultsCount = (count + elementsPerSum - 1) / elementsPerSum // Number of individual results = count / elementsPerSum (rounded up)
let dataBuffer = device.newBufferWithBytes(data, length: strideof(DataType) * count, options: []) // Our data in a buffer (copied)
let resultsBuffer = device.newBufferWithLength(strideof(DataType) * resultsCount, options: []) // A buffer for individual results (zero initialized)
let results = UnsafeBufferPointer<DataType>(start: UnsafePointer(resultsBuffer.contents()), count: resultsCount) // Our results in convenient form to compute the actual result later
let queue = device.newCommandQueue()
let cmds = queue.commandBuffer()
let encoder = cmds.computeCommandEncoder()
encoder.setComputePipelineState(pipeline)
encoder.setBuffer(dataBuffer, offset: 0, atIndex: 0)
encoder.setBytes(&dataCount, length: sizeofValue(dataCount), atIndex: 1)
encoder.setBuffer(resultsBuffer, offset: 0, atIndex: 2)
encoder.setBytes(&elementsPerSumC, length: sizeofValue(elementsPerSumC), atIndex: 3)
// We have to calculate the sum `resultCount` times => amount of threadgroups is `resultsCount` / `threadExecutionWidth` (rounded up) because each threadgroup will process `threadExecutionWidth` threads
let threadgroupsPerGrid = MTLSize(width: (resultsCount + pipeline.threadExecutionWidth - 1) / pipeline.threadExecutionWidth, height: 1, depth: 1)
// Here we set that each threadgroup should process `threadExecutionWidth` threads, the only important thing for performance is that this number is a multiple of `threadExecutionWidth` (here 1 times)
let threadsPerThreadgroup = MTLSize(width: pipeline.threadExecutionWidth, height: 1, depth: 1)
encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
encoder.endEncoding()
cmds.commit()
cmds.waitUntilCompleted()
return results
}
func sumParallel3(data : Array<DataType>) -> DataType {
var results = sumParralel1(data)
repeat {
results = sumParralel1(Array(results))
} while results.count >= 100
var result : DataType = 0
for elem in results {
result += elem
}
return result
}
func sumParallel4(data : Array<DataType>) -> DataType {
let queue = NSOperationQueue()
queue.maxConcurrentOperationCount = 4
var a0 : DataType = 0
var a1 : DataType = 0
var a2 : DataType = 0
var a3 : DataType = 0
let op0 = NSBlockOperation( block : {
for i in 0..<(data.count/4) {
a0 = a0 + data[i]
}
})
let op1 = NSBlockOperation( block : {
for i in (data.count/4)..<(data.count/2) {
a1 = a1 + data[i]
}
})
let op2 = NSBlockOperation( block : {
for i in (data.count/2)..<(3 * data.count/4) {
a2 = a2 + data[i]
}
})
let op3 = NSBlockOperation( block : {
for i in (3 * data.count/4)..<(data.count) {
a3 = a3 + data[i]
}
})
queue.addOperation(op0)
queue.addOperation(op1)
queue.addOperation(op2)
queue.addOperation(op3)
queue.suspended = false
queue.waitUntilAllOperationsAreFinished()
let aaa: DataType = a0 + a1 + a2 + a3
return aaa
}
}
And I have a shader that looks like this:
kernel void parsum(const device DataType* data [[ buffer(0) ]],
const device uint& dataLength [[ buffer(1) ]],
device DataType* sums [[ buffer(2) ]],
const device uint& elementsPerSum [[ buffer(3) ]],
const uint tgPos [[ threadgroup_position_in_grid ]],
const uint tPerTg [[ threads_per_threadgroup ]],
const uint tPos [[ thread_position_in_threadgroup ]]) {
uint resultIndex = tgPos * tPerTg + tPos; // This is the index of the individual result, this var is unique to this thread
uint dataIndex = resultIndex * elementsPerSum; // Where the summation should begin
uint endIndex = dataIndex + elementsPerSum < dataLength ? dataIndex + elementsPerSum : dataLength; // The index where summation should end
for (; dataIndex < endIndex; dataIndex++)
sums[resultIndex] += data[dataIndex];
}
On my surprise function sumParallel4 is the fastest, which I thought it shouldn't be. I noticed that when I call functions sumParralel and sumParallel3, the first function is always slower even if I change the order of function. (So if I call sumParralel first this is slower, if I call sumParallel3 this is slower.).
Why is this? Why is sumParallel3 not a lot faster than sumParallel ? Why is sumParallel4 the fastest, although it is calculated on CPU?
How can I update my GPU function with posix_memalign ? I know it should work faster because it would have shared memory between GPU and CPU, but I don't know witch array should be allocated this way (data or result) and how can I allocate data with posix_memalign if data is parameter passed in function?
In running these tests on an iPhone 6, I saw the Metal version run between 3x slower and 2x faster than the naive CPU summation. With the modifications I describe below, it was consistently faster.
I found that a lot of the cost in running the Metal version could be attributed not merely to the allocation of the buffers, though that was significant, but also to the first-time creation of the device and compute pipeline state. These are actions you'd normally perform once at application initialization, so it's not entirely fair to include them in the timing.
It should also be noted that if you're running these tests through Xcode with the Metal validation layer and GPU frame capture enabled, that has a significant run-time cost and will skew the results in the CPU's favor.
With those caveats, here's how you might use posix_memalign to allocate memory that can be used to back a MTLBuffer. The trick is to ensure that the memory you request is in fact page-aligned (i.e. its address is a multiple of getpagesize()), which may entail rounding up the amount of memory beyond how much you actually need to store your data:
let dataCount = 1_000_000
let dataSize = dataCount * strideof(DataType)
let pageSize = Int(getpagesize())
let pageCount = (dataSize + (pageSize - 1)) / pageSize
var dataPointer: UnsafeMutablePointer<Void> = nil
posix_memalign(&dataPointer, pageSize, pageCount * pageSize)
let data = UnsafeMutableBufferPointer(start: UnsafeMutablePointer<DataType>(dataPointer),
count: (pageCount * pageSize) / strideof(DataType))
for i in 0..<dataCount {
data[i] = 200
}
This does require making data an UnsafeMutableBufferPointer<DataType>, rather than an [DataType], since Swift's Array allocates its own backing store. You'll also need to pass along the count of data items to operate on, since the count of the mutable buffer pointer has been rounded up to make the buffer page-aligned.
To actually create a MTLBuffer backed with this data, use the newBufferWithBytesNoCopy(_:length:options:deallocator:) API. It's crucial that, once again, the length you provide is a multiple of the page size; otherwise this method returns nil:
let roundedUpDataSize = strideof(DataType) * data.count
let dataBuffer = device.newBufferWithBytesNoCopy(data.baseAddress, length: roundedUpDataSize, options: [], deallocator: nil)
Here, we don't provide a deallocator, but you should free the memory when you're done using it, by passing the baseAddress of the buffer pointer to free().