Error evaluting CoreML custom layer "----" on GPU? - metal

I get this error without any other details.
This is the metal code
#include <metal_stdlib>
using namespace metal;
kernel void copy(texture2d_array<half, access::read> in_texture [[texture(0)]] ,
texture2d_array<half, access::write> out_texture [[texture(1)]],
ushort3 gid [[thread_position_in_grid]])
if (gid.x >= out_texture.get_width() || gid.y >= out_texture.get_height()) {
const float4 x = float4(, gid.z));
out_texture.write(half4(x), gid.xy, gid.z);
This is the computer buffer implementation.
-(BOOL)encodeToCommandBuffer:(id<MTLCommandBuffer>)commandBuffer inputs:(NSArray<id<MTLTexture>> *)inputs outputs:(NSArray<id<MTLTexture>> *)outputs error:(NSError *__autoreleasing _Nullable *)error
auto encoder = [commandBuffer computeCommandEncoder];
[encoder setTexture: input atIndex: 0];
[encoder setTexture: output atIndex: 1];
[encoder setComputePipelineState:_pipeline];
// Set the compute kernel's threadgroup size of 16x16
MTLSize thread_group_size = MTLSizeMake(16, 16, 1);
MTLSize thread_group_count = MTLSizeMake(0, 0, 1);
// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
thread_group_count.width = (input.width + thread_group_size.width - 1) / thread_group_size.width;
thread_group_count.height = (input.height + thread_group_size.height - 1) / thread_group_size.height;
[encoder dispatchThreadgroups:thread_group_count threadsPerThreadgroup:thread_group_size];
[encoder endEncoding];
Based off of :
What can I use to pin down the error?
I tried using an empty encodeToCommandBuffer and I still get the same error.


Compute Kernel Metal - How to retrieve results and debug?

I've downloaded apple's truedepth streamer example and am trying to add a compute pipeline. I think I'm retrieving the results of the computation but am not sure as they all seem to be zero.
I'm a beginner at iOS development so there maybe quite a few mistakes so please bear with me!
The pipeline set up: (i wasn't quite sure how to create the resultsbuffer, since the kernel outputs a float3)
int resultsCount = CVPixelBufferGetWidth(depthFrame) * CVPixelBufferGetHeight(depthFrame);
//because I will be output 3 floats for each value in depthframe
id<MTLBuffer> resultsBuffer = [self.device newBufferWithLength:(sizeof(float) * 3 * resultsCount) options:MTLResourceOptionCPUCacheModeDefault];
_threadgroupSize = MTLSizeMake(16, 16, 1);
// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width = (inTexture.width + _threadgroupSize.width - 1) / _threadgroupSize.width;
_threadgroupCount.height = (inTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;
// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;
id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
[computeEncoder setComputePipelineState:_computePipelineState];
[computeEncoder setTexture: inTexture atIndex:0];
[computeEncoder setBuffer:resultsBuffer offset:0 atIndex:1];
[computeEncoder setBytes:&intrinsics length:sizeof(intrinsics) atIndex:0];
[computeEncoder dispatchThreadgroups:_threadgroupCount
[computeEncoder endEncoding];
// Finalize rendering here & push the command buffer to the GPU
[commandBuffer commit];
//for testing
[commandBuffer waitUntilCompleted];
I have added the following compute kernel:
kernel void
calc(texture2d<float, access::read> inTexture [[texture(0)]],
device float3 *resultsBuffer [[buffer(1)]],
constant float3x3& cameraIntrinsics [[ buffer(0) ]],
uint2 gid [[thread_position_in_grid]])
float val = * 1000.0f;
float xrw = (gid.x - cameraIntrinsics[2][0]) * val / cameraIntrinsics[0][0];
float yrw = (gid.y - cameraIntrinsics[2][1]) * val / cameraIntrinsics[1][1];
int vertex_id = ((gid.y * inTexture.get_width()) + gid.x);
resultsBuffer[vertex_id] = float3(xrw, yrw, val);
Code for seeing buffer result: (I tried two different ways and both are outputting all zeroes at the moment)
void *output = [resultsBuffer contents];
for (int i = 0; i < 10; ++i) {
NSLog(#"value is %f", *(float *)(output) ); //= *(float *)(output + 4 * i);
NSData *data = [NSData dataWithBytesNoCopy:resultsBuffer.contents length:(sizeof(float) * 3 * resultsCount)freeWhenDone:NO];
float *finalArray = new float [resultsCount * 3];
[data getBytes:&finalArray[0] length:sizeof(finalArray)];
for (int i = 0; i < 10; ++i) {
NSLog(#"here is output %f", finalArray[i]);
I see a couple of problems here, but neither of them are related to your Metal code per se.
In your first output loop, as written, you're just printing the first element of the results buffer 10 times. The first element may legitimately be 0, leading you to believe all of the results are zero. But when I changed the first log line to
NSLog(#"value is %f", ((float *)output)[i]);
I saw different values printed when running your kernel on a test image.
The other issue is related to your getBytes:length: call. You want to pass the number of bytes to copy, but sizeof(finalArray) is actually the size of the finalArray pointer, i.e., 4 bytes, not the total size of the buffer it points to. This is an extremely common error in C and C++ code.
Instead, you can use the same byte count as the one you used when allocating space:
[data getBytes:&finalArray[0] length:(sizeof(float) * 3 * resultsCount)];
You should then find that you get the same (non-zero) values printed as in the previous step.

Metal compute pipeline absurdly slow

I saw an opportunity to improve my app performance by using a Metal compute pipeline. However, my initial testing revealed the the compute pipeline was absurdly slow (at least on older device).
So I did a sample project to compare the compute and render pipelines performance. The program takes a 2048 x 2048 source texture and convert it to grayscale in a destination texture.
On an iPhone 5S, it took 3 ms for the fragment shader to do the convertion. However, it took 177 ms for the compute kernel to do the same thing. That is 59 times longer!!!
What is your exeperience with the compute pipeline on older device? It isn't absurdly slow?
Here's are my fragment and compute functions:
// Grayscale Fragment Function
fragment half4 grayscaleFragment(RasterizerData in [[stage_in]],
texture2d<half> inTexture [[texture(0)]])
constexpr sampler textureSampler;
half4 inColor = inTexture.sample(textureSampler, in.textureCoordinate);
half gray = dot(inColor.rgb, kRec709Luma);
return half4(gray, gray, gray, 1.0);
// Grayscale Kernel Function
kernel void grayscaleKernel(uint2 gid [[thread_position_in_grid]],
texture2d<half, access::read> inTexture [[texture(0)]],
texture2d<half, access::write> outTexture [[texture(1)]])
half4 inColor =;
half gray = dot(inColor.rgb, kRec709Luma);
outTexture.write(half4(gray, gray, gray, 1.0), gid);
Compute and render methods
- (void)compute {
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
// Compute encoder
id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
[computeEncoder setComputePipelineState:_computePipelineState];
[computeEncoder setTexture:_srcTexture atIndex:0];
[computeEncoder setTexture:_dstTexture atIndex:1];
[computeEncoder dispatchThreadgroups:_threadgroupCount threadsPerThreadgroup:_threadgroupSize];
[computeEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
- (void)render {
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
// Render pass descriptor
MTLRenderPassDescriptor *renderPassDescriptor = [MTLRenderPassDescriptor renderPassDescriptor];
renderPassDescriptor.colorAttachments[0].loadAction = MTLLoadActionDontCare;
renderPassDescriptor.colorAttachments[0].texture = _dstTexture;
renderPassDescriptor.colorAttachments[0].storeAction = MTLStoreActionStore;
// Render encoder
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:renderPassDescriptor];
[renderEncoder setRenderPipelineState:_renderPipelineState];
[renderEncoder setFragmentTexture:_srcTexture atIndex:0];
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4];
[renderEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
And Metal setup:
- (void)setupMetal
// Get metal device
_device = MTLCreateSystemDefaultDevice();
// Create the command queue
_commandQueue = [_device newCommandQueue];
id<MTLLibrary> defaultLibrary = [_device newDefaultLibrary];
// Create compute pipeline state
_computePipelineState = [_device newComputePipelineStateWithFunction:[defaultLibrary newFunctionWithName:#"grayscaleKernel"] error:nil];
// Create render pipeline state
MTLRenderPipelineDescriptor *pipelineStateDescriptor = [[MTLRenderPipelineDescriptor alloc] init];
pipelineStateDescriptor.vertexFunction = [defaultLibrary newFunctionWithName:#"vertexShader"];
pipelineStateDescriptor.fragmentFunction = [defaultLibrary newFunctionWithName:#"grayscaleFragment"];
pipelineStateDescriptor.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm;
_renderPipelineState = [_device newRenderPipelineStateWithDescriptor:pipelineStateDescriptor error:nil];
// Create source and destination texture descriptor
// Since the compute kernel function doesn't check if pixels are within the bounds of the destination texture, make sure texture width
// and height are multiples of the pipeline threadExecutionWidth and (threadExecutionWidth / maxTotalThreadsPerThreadgroup) respectivly.
MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm
// Create source texture
textureDescriptor.usage = MTLTextureUsageShaderRead;
_srcTexture = [_device newTextureWithDescriptor:textureDescriptor];
// Create description texture
textureDescriptor.usage = MTLTextureUsageShaderWrite | MTLTextureUsageRenderTarget;
_dstTexture = [_device newTextureWithDescriptor:textureDescriptor];
// Set the compute kernel's threadgroup size
NSUInteger threadWidth = _computePipelineState.threadExecutionWidth;
NSUInteger threadMax = _computePipelineState.maxTotalThreadsPerThreadgroup;
_threadgroupSize = MTLSizeMake(threadWidth, threadMax / threadWidth, 1);
// Set the compute kernel's threadgroup count
_threadgroupCount.width = (_srcTexture.width + _threadgroupSize.width - 1) / _threadgroupSize.width;
_threadgroupCount.height = (_srcTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;
_threadgroupCount.depth = 1;
The Metal compute pipeline is unusable on A7 class CPU/GPU devices. The same compute pipeline has great performance on A8 and newer devices. Your options for dealing with this are to create fragment shader impls for A7 devices and use compute logic for all newer devices, or you can export computation to the CPUs on A7 (there are at least 2 CPUs with this device class). You could also just use all fragment shaders for all devices, but much better performance on complex code is possible with compute kernels, so it is something to think about.

Passing a realtime FFT setup to execution function

I'm trying to do a realtime FFT on microphone data in iOS using Novacaine and I cant seem to pass the FFT setup and ring buffer to my FFT function.
For testing, my plan was to do the setup in the viewWillAppear method, then execute my FFT function when a button is pressed. I'll then deallocate the memory for my buffers and destroy the FFTsetup when the window closes.
Here's what I have so far. I've tried several variations of passing arguments to viewWillAppear but nothing seems to be working. Any suggestions are appreceated.
- (void)viewWillAppear:(BOOL)animated
[super viewWillAppear:animated];
ringBuffer = new RingBuffer(8192, 2);
audioManager = [Novocaine audioManager];
[audioManager setInputBlock:^(float *data, UInt32 numFrames, UInt32 numChannels) {
// Setup FFT here
int numSamples = 8192;
// Setup the length
vDSP_Length log2n = log2f(numSamples);
// Calculate the weights array. This is a one-off operation.
FFTSetup fftSetup = vDSP_create_fftsetup(log2n, FFT_RADIX2);
// For an FFT, numSamples must be a power of 2, i.e. is always even
int nOver2 = numSamples/2;
// Populate *window with the values for a hamming window function
float *window = (float *)malloc(sizeof(float) * numSamples);
vDSP_hamm_window(window, numSamples, 0);
// Window the samples
vDSP_vmul(data, 1, window, 1, data, 1, numSamples);
// Define complex buffer
A.realp = (float *) malloc(nOver2*sizeof(float));
A.imagp = (float *) malloc(nOver2*sizeof(float));
- (IBAction)buttonPressed:(id)sender {
// do myFFT here
// Pack samples:
// C(re) -> A[n], C(im) -> A[n+1]
vDSP_ctoz((COMPLEX*)data, 2, &A, 1, numSamples/2);
//Perform a forward FFT using fftSetup and A
//Results are returned in A
vDSP_fft_zrip(fftSetup, &A, 1, log2n, FFT_FORWARD);
//Convert COMPLEX_SPLIT A result to magnitudes
float *amp = (float *)malloc(sizeof(float) * numSamples);
amp[0] = A.realp[0]/(numSamples*2);
float max = 0;
int indexOfMax = -1;
for(int i=1; i<numSamples; i++) {
//printf("i[%ld]: %.1f %ldHz \n", (long)i, amp[i], (long)22000 * i/numSamples);
if(amp[i] > max) {
max = amp[i];
indexOfMax = i;
long fmax = ((long)indexOfMax - numSamples/2)*44100/4096;
printf("max frequency is %ld\n", fmax);
The real-time audio input block isn't for doing any heavy computation, such as an FFT. Instead the input block should just (quickly!) copy the data into a large enough ring buffer.
Later, during a displayLink timer or when the button is pressed, you can check the ring buffer to see if it has enough new data, and then do the FFT.
Your code also seems to confuse the FFT size numSamples with the (usually much smaller) actual amount of samples received by the callback, which is numFrames. You can't do the FFT until enough callbacks have been called for numFrames to sum up to be equal or greater than the FFT size.

Compute the histogram of an image using vImageHistogramCalculation

I'm trying to compute the histogram of an image using vImage's vImageHistogramCalculation_ARGBFFFF, but I'm getting a vImage_Error of type kvImageNullPointerArgument (error code a -21772).
Here's my code:
- (void)histogramForImage:(UIImage *)image {
//setup inBuffer
vImage_Buffer inBuffer;
//Get CGImage from UIImage
CGImageRef img = image.CGImage;
//create vImage_Buffer with data from CGImageRef
CGDataProviderRef inProvider = CGImageGetDataProvider(img);
CFDataRef inBitmapData = CGDataProviderCopyData(inProvider);
//The next three lines set up the inBuffer object
inBuffer.width = CGImageGetWidth(img);
inBuffer.height = CGImageGetHeight(img);
inBuffer.rowBytes = CGImageGetBytesPerRow(img);
//This sets the pointer to the data for the inBuffer object = (void*)CFDataGetBytePtr(inBitmapData);
//Prepare the parameters to pass to vImageHistogramCalculation_ARGBFFFF
vImagePixelCount *histogram[4] = {0};
unsigned int histogram_entries = 4;
Pixel_F minVal = 0;
Pixel_F maxVal = 255;
vImage_Flags flags = kvImageNoFlags;
vImage_Error error = vImageHistogramCalculation_ARGBFFFF(&inBuffer,
if (error) {
NSLog(#"error %ld", error);
//clean up
I suspect it has something to do with my histogram parameter, which, according to the docs, is supposed to be "a pointer to an array of four histograms". Am I declaring it correctly?
The trouble is that you’re not allocating space to hold the computed histograms. If you are only using the histograms locally, you can put them on the stack like so [note that I’m using eight bins instead of four, to make the example more clear]:
// create an array of four histograms with eight entries each.
vImagePixelCount histogram[4][8] = {{0}};
// vImageHistogramCalculation requires an array of pointers to the histograms.
vImagePixelCount *histogramPointers[4] = { &histogram[0][0], &histogram[1][0], &histogram[2][0], &histogram[3][0] };
vImage_Error error = vImageHistogramCalculation_ARGBFFFF(&inBuffer, histogramPointers, 8, 0, 255, kvImageNoFlags);
// You can now access bin j of the histogram for channel i as histogram[i][j].
// The storage for the histogram will be cleaned up when execution leaves the
// current lexical block.
If you need the histograms to stick around outside the scope of your function, you’ll need to allocate space for them on the heap instead:
vImagePixelCount *histogram[4];
unsigned int histogramEntries = 8;
histogram[0] = malloc(4*histogramEntries*sizeof histogram[0][0]);
if (!histogram[0]) { // handle error however is appropriate }
for (int i=1; i<4; ++i) { histogram[i] = &histogram[0][i*histogramEntries]; }
vImage_Error error = vImageHistogramCalculation_ARGBFFFF(&inBuffer, histogram, 8, 0, 255, kvImageNoFlags);
// You can now access bin j of the histogram for channel i as histogram[i][j].
// Eventually you will need to free(histogram[0]) to release the storage.
Hope this helps.

iOS:Retrieve rectangle shaped image from the background image

I am working on an implementation where I have a rectangle shaped image in an big background image. I am trying to programmatically retrieve the rectangle shaped image from the big image and retrieve text information from that particular rectangle image. I am trying to use Open-CV third party framework, but couldn't able to retrieve the rectangle image from the big background image. Could someone please guide me, how i can achieve this?
I found the Link to find out the square shapes using OpenCV. Can i get it modified for finding Rectangle shapes? Can someone guide me on this?
I got the code finally, here is it below.
- (cv::Mat)cvMatWithImage:(UIImage *)image
CGColorSpaceRef colorSpace = CGImageGetColorSpace(image.CGImage);
CGFloat cols = image.size.width;
CGFloat rows = image.size.height;
cv::Mat cvMat(rows, cols, CV_8UC4); // 8 bits per component, 4 channels
CGContextRef contextRef = CGBitmapContextCreate(, // Pointer to backing data
cols, // Width of bitmap
rows, // Height of bitmap
8, // Bits per component
cvMat.step[0], // Bytes per row
colorSpace, // Colorspace
kCGImageAlphaNoneSkipLast |
kCGBitmapByteOrderDefault); // Bitmap info flags
CGContextDrawImage(contextRef, CGRectMake(0, 0, cols, rows), image.CGImage);
return cvMat;
-(UIImage *)UIImageFromCVMat:(cv::Mat)cvMat
NSData *data = [NSData length:cvMat.elemSize()*];
CGColorSpaceRef colorSpace;
if ( cvMat.elemSize() == 1 ) {
colorSpace = CGColorSpaceCreateDeviceGray();
else {
colorSpace = CGColorSpaceCreateDeviceRGB();
//CFDataRef data;
CGDataProviderRef provider = CGDataProviderCreateWithCFData( (CFDataRef) data ); // It SHOULD BE (__bridge CFDataRef)data
CGImageRef imageRef = CGImageCreate( cvMat.cols, cvMat.rows, 8, 8 * cvMat.elemSize(), cvMat.step[0], colorSpace, kCGImageAlphaNone|kCGBitmapByteOrderDefault, provider, NULL, false, kCGRenderingIntentDefault );
UIImage *finalImage = [UIImage imageWithCGImage:imageRef];
CGImageRelease( imageRef );
CGDataProviderRelease( provider );
CGColorSpaceRelease( colorSpace );
return finalImage;
imageView = [UIImage imageNamed:#"myimage.jpg"];
if( imageView != nil )
cv::Mat tempMat = [imageView CVMat];
cv::Mat greyMat = [self cvMatWithImage:imageView];
cv::vector<cv::vector<cv::Point> > squares;
cv::Mat img= [self debugSquares: squares: greyMat];
imageView = [self UIImageFromCVMat: img];
self.imageView.image = imageView;
double angle( cv::Point pt1, cv::Point pt2, cv::Point pt0 ) {
double dx1 = pt1.x - pt0.x;
double dy1 = pt1.y - pt0.y;
double dx2 = pt2.x - pt0.x;
double dy2 = pt2.y - pt0.y;
return (dx1*dx2 + dy1*dy2)/sqrt((dx1*dx1 + dy1*dy1)*(dx2*dx2 + dy2*dy2) + 1e-10);
- (cv::Mat) debugSquares: (std::vector<std::vector<cv::Point> >) squares : (cv::Mat &)image
// blur will enhance edge detection
//cv::Mat blurred(image);
cv::Mat blurred = image.clone();
medianBlur(image, blurred, 9);
cv::Mat gray0(image.size(), CV_8U), gray;
cv::vector<cv::vector<cv::Point> > contours;
// find squares in every color plane of the image
for (int c = 0; c < 3; c++)
int ch[] = {c, 0};
mixChannels(&image, 1, &gray0, 1, ch, 1);
// try several threshold levels
const int threshold_level = 2;
for (int l = 0; l < threshold_level; l++)
// Use Canny instead of zero threshold level!
// Canny helps to catch squares with gradient shading
if (l == 0)
Canny(gray0, gray, 10, 20, 3); //
// Dilate helps to remove potential holes between edge segments
dilate(gray, gray, cv::Mat(), cv::Point(-1,-1));
gray = gray0 >= (l+1) * 255 / threshold_level;
// Find contours and store them in a list
findContours(gray, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
// Test contours
cv::vector<cv::Point> approx;
for (size_t i = 0; i < contours.size(); i++)
// approximate contour with accuracy proportional
// to the contour perimeter
approxPolyDP(cv::Mat(contours[i]), approx, arcLength(cv::Mat(contours[i]), true)*0.02, true);
// Note: absolute value of an area is used because
// area may be positive or negative - in accordance with the
// contour orientation
if (approx.size() == 4 &&
fabs(contourArea(cv::Mat(approx))) > 1000 &&
double maxCosine = 0;
for (int j = 2; j < 5; j++)
double cosine = fabs(angle(approx[j%4], approx[j-2], approx[j-1]));
maxCosine = MAX(maxCosine, cosine);
if (maxCosine < 0.3)
NSLog(#"squares.size(): %lu",squares.size());
for( size_t i = 0; i < squares.size(); i++ )
cv::Rect rectangle = boundingRect(cv::Mat(squares[i]));
NSLog(#"rectangle.x: %d", rectangle.x);
NSLog(#"rectangle.y: %d", rectangle.y);
if(i==squares.size()-1)////Detecting Rectangle here
const cv::Point* p = &squares[i][0];
int n = (int)squares[i].size();
line(image, cv::Point(507,418), cv::Point(507+1776,418+1372), cv::Scalar(255,0,0),2,8);
polylines(image, &p, &n, 1, true, cv::Scalar(255,255,0), 5, CV_AA);
int fx1=rectangle.x;
NSLog(#"X: %d", fx1);
int fy1=rectangle.y;
NSLog(#"Y: %d", fy1);
int fx2=rectangle.x+rectangle.width;
NSLog(#"Width: %d", fx2);
int fy2=rectangle.y+rectangle.height;
NSLog(#"Height: %d", fy2);
line(image, cv::Point(fx1,fy1), cv::Point(fx2,fy2), cv::Scalar(0,0,255),2,8);
return image;
Thank you.
Here is a full answer using a small wrapper class to separate the c++ from objective-c code.
I had to raise another question on stackoverflow to deal with my poor c++ knowledge - but I have worked out everything we need to interface c++ cleanly with objective-c code, using the squares.cpp sample code as an example. The aim is to keep the original c++ code as pristine as possible, and to keep the bulk of the work with openCV in pure c++ files for (im)portability.
I have left my original answer in place as this seems to go beyond an edit. The complete demo project is on github
CVViewController.h / CVViewController.m
pure Objective-C
communicates with openCV c++ code via a WRAPPER... it neither knows nor cares that c++ is processing these method calls behind the wrapper.
CVWrapper.h /
does as little as possible, really only two things...
calls to UIImage objC++ categories to convert to and from UIImage <> cv::Mat
mediates between CVViewController's obj-C methods and CVSquares c++ (class) function calls
CVSquares.h / CVSquares.cpp
pure C++
CVSquares.cpp declares public functions inside a class definition (in this case, one static function).
This replaces the work of main{} in the original file.
We try to keep CVSquares.cpp as close as possible to the C++ original for portability.
//remove 'magic numbers' from original C++ source so we can manipulate them from obj-C
#define TOLERANCE 0.01
#define THRESHOLD 50
#define LEVELS 9
UIImage* image =
[CVSquaresWrapper detectedSquaresInImage:self.image
// CVSquaresWrapper.h
#import <Foundation/Foundation.h>
#interface CVSquaresWrapper : NSObject
+ (UIImage*) detectedSquaresInImage:(UIImage*)image
// wrapper that talks to c++ and to obj-c classes
#import "CVSquaresWrapper.h"
#import "CVSquares.h"
#import "UIImage+OpenCV.h"
#implementation CVSquaresWrapper
+ (UIImage*) detectedSquaresInImage:(UIImage*) image
UIImage* result = nil;
//convert from UIImage to cv::Mat openCV image format
//this is a category on UIImage
cv::Mat matImage = [image CVMat];
//call the c++ class static member function
//we want this function signature to exactly
//mirror the form of the calling method
matImage = CVSquares::detectedSquaresInImage (matImage, tolerance, threshold, levels);
//convert back from cv::Mat openCV image format
//to UIImage image format (category on UIImage)
result = [UIImage imageFromCVMat:matImage];
return result;
// CVSquares.h
#ifndef __OpenCVClient__CVSquares__
#define __OpenCVClient__CVSquares__
//class definition
//in this example we do not need a class
//as we have no instance variables and just one static function.
//We could instead just declare the function but this form seems clearer
class CVSquares
static cv::Mat detectedSquaresInImage (cv::Mat image, float tol, int threshold, int levels);
#endif /* defined(__OpenCVClient__CVSquares__) */
// CVSquares.cpp
#include "CVSquares.h"
using namespace std;
using namespace cv;
static int thresh = 50, N = 11;
static float tolerance = 0.01;
//declarations added so that we can move our
//public function to the top of the file
static void findSquares( const Mat& image, vector<vector<Point> >& squares );
static void drawSquares( Mat& image, vector<vector<Point> >& squares );
//this public function performs the role of
//main{} in the original file (main{} is deleted)
cv::Mat CVSquares::detectedSquaresInImage (cv::Mat image, float tol, int threshold, int levels)
vector<vector<Point> > squares;
if( image.empty() )
cout << "Couldn't load " << endl;
tolerance = tol;
thresh = threshold;
N = levels;
findSquares(image, squares);
drawSquares(image, squares);
return image;
// the rest of this file is identical to the original squares.cpp except:
// main{} is removed
// this line is removed from drawSquares:
// imshow(wndname, image);
// (obj-c will do the drawing)
The UIImage category is an objC++ file containing the code to convert between UIImage and cv::Mat image formats. This is where you move your two methods -(UIImage *)UIImageFromCVMat:(cv::Mat)cvMat and - (cv::Mat)cvMatWithImage:(UIImage *)image
#import <UIKit/UIKit.h>
#interface UIImage (UIImage_OpenCV)
//cv::Mat to UIImage
+ (UIImage *)imageFromCVMat:(cv::Mat&)cvMat;
//UIImage to cv::Mat
- (cv::Mat)CVMat;
The method implementations here are unchanged from your code (although we don't pass a UIImage in to convert, instead we refer to self)
Here is a partial answer. It is not complete because I am attempting to do the exact same thing and experiencing huge difficulties every step of the way. My knowledge is quite strong on objective-c but really weak on C++
You should read this guide to wrapping c++
And everything on Ievgen Khvedchenia's Computer Vision Talks blog, especially the openCV tutorial. Ievgen has also posted an amazingly complete project on github to go with the tutorial.
Having said that, I am still having a lot of trouble getting openCV to compile and run smoothly.
For example, Ievgen's tutorial runs fine as a finished project, but if I try to recreate it from scratch I get the same openCV compile errors that have been plaguing me all along. It's probably my poor understanding of C++ and it's integration with obj-C.
Regarding squares.cpp
What you need to do
remove int main(int /*argc*/, char** /*argv*/) from squares.cpp
remove imshow(wndname, image); from drawSquares (obj-c will do the drawing)
create a header file squares.h
make one or two public functions in the header file which you can call from obj-c (or from an obj-c/c++ wrapper)
Here is what I have so far...
class squares
static cv::Mat& findSquares( const cv::Mat& image, cv::vector<cv::vector<cv::Point> >& squares );
static cv::Mat& drawSquares( cv::Mat& image, const cv::vector<cv::vector<cv::Point> >& squares );
you should be able to reduce this to a single method, say processSquares with one input cv::Mat& image and one return cv::Mat& image. That method would declare squares and call findSquares and drawSquares within the .cpp file.
The wrapper will take an input UIImage, convert it to cv::Mat image, call processSquares with that input, and get a result cv::Mat image. That result it will convert back to NSImage and pass back to the objc calling function.
SO that's a neat sketch of what we need to do, I will try and expand this answer once I've actually managed to do any of it!
