I am trying to implement something similar to this using openCV
https://mathematica.stackexchange.com/questions/19546/image-processing-floor-plan-detecting-rooms-borders-area-and-room-names-t
However, I am running into some walls (probably due to my own ignorance in working with OpenCV).
When I try to perform a distance transform on my image, I am not getting the expected result at all.
This is the original image I am working with
This is the image I get after doing some cleanup with opencv
This is the wierdness I get after trying to run a distance transform on the above image. My understanding is that this should look more like a blurry heatmap. If I follow the opencv example passed this point and try to run a threshold to find the distance peaks, I get nothing but a black image.
This is the code thus far that I have cobbled together using a few different opencv examples
cv::Mat outerBox = cv::Mat(matImage.size(), CV_8UC1);
cv::Mat kernel = (cv::Mat_<uchar>(3,3) << 0,1,0,1,1,1,0,1,0);
for(int x = 0; x < 3; x++) {
cv::GaussianBlur(matImage, matImage, cv::Size(11,11), 0);
cv::adaptiveThreshold(matImage, outerBox, 255, cv::ADAPTIVE_THRESH_MEAN_C, cv::THRESH_BINARY, 5, 2);
cv::bitwise_not(outerBox, outerBox);
cv::dilate(outerBox, outerBox, kernel);
cv::dilate(outerBox, outerBox, kernel);
removeBlobs(outerBox, 1);
erode(outerBox, outerBox, kernel);
}
cv::Mat dist;
cv::bitwise_not(outerBox, outerBox);
distanceTransform(outerBox, dist, cv::DIST_L2, 5);
// Normalize the distance image for range = {0.0, 1.0}
// so we can visualize and threshold it
normalize(dist, dist, 0, 1., cv::NORM_MINMAX);
//using a threshold at this point like the opencv example shows to find peaks just returns a black image right now
//threshold(dist, dist, .4, 1., CV_THRESH_BINARY);
//cv::Mat kernel1 = cv::Mat::ones(3, 3, CV_8UC1);
//dilate(dist, dist, kernel1);
self.mainImage.image = [UIImage fromCVMat:outerBox];
void removeBlobs(cv::Mat &outerBox, int iterations) {
int count=0;
int max=-1;
cv::Point maxPt;
for(int iteration = 0; iteration < iterations; iteration++) {
for(int y=0;y<outerBox.size().height;y++)
{
uchar *row = outerBox.ptr(y);
for(int x=0;x<outerBox.size().width;x++)
{
if(row[x]>=128)
{
int area = floodFill(outerBox, cv::Point(x,y), CV_RGB(0,0,64));
if(area>max)
{
maxPt = cv::Point(x,y);
max = area;
}
}
}
}
floodFill(outerBox, maxPt, CV_RGB(255,255,255));
for(int y=0;y<outerBox.size().height;y++)
{
uchar *row = outerBox.ptr(y);
for(int x=0;x<outerBox.size().width;x++)
{
if(row[x]==64 && x!=maxPt.x && y!=maxPt.y)
{
int area = floodFill(outerBox, cv::Point(x,y), CV_RGB(0,0,0));
}
}
}
}
}
I've been banging my head on this for a few hours and I am totally stuck in the mud on it, so any help would be greatly appreciated. This is a little bit out of my depth, and I feel like I am probably just making some basic mistake somewhere without realizing it.
EDIT:
Using the same code as above running OpenCV for Mac (not iOS) I get the expected results
This seems to indicate that the issue is with the Mat -> UIImage bridging that OpenCV suggests using. I am going to push forward using the Mac library to test my code, but it would sure be nice to be able to get consistent results from the UIImage bridging as well.
+ (UIImage*)fromCVMat:(const cv::Mat&)cvMat
{
// (1) Construct the correct color space
CGColorSpaceRef colorSpace;
if ( cvMat.channels() == 1 ) {
colorSpace = CGColorSpaceCreateDeviceGray();
} else {
colorSpace = CGColorSpaceCreateDeviceRGB();
}
// (2) Create image data reference
CFDataRef data = CFDataCreate(kCFAllocatorDefault, cvMat.data, (cvMat.elemSize() * cvMat.total()));
// (3) Create CGImage from cv::Mat container
CGDataProviderRef provider = CGDataProviderCreateWithCFData(data);
CGImageRef imageRef = CGImageCreate(cvMat.cols,
cvMat.rows,
8,
8 * cvMat.elemSize(),
cvMat.step[0],
colorSpace,
kCGImageAlphaNone | kCGBitmapByteOrderDefault,
provider,
NULL,
false,
kCGRenderingIntentDefault);
// (4) Create UIImage from CGImage
UIImage * finalImage = [UIImage imageWithCGImage:imageRef];
// (5) Release the references
CGImageRelease(imageRef);
CGDataProviderRelease(provider);
CFRelease(data);
CGColorSpaceRelease(colorSpace);
// (6) Return the UIImage instance
return finalImage;
}
I worked out distance transform in OpenCV using python and I was able to obtain this:
You stated "I get nothing but a black image". Well I faced the same problem initially, until I converted the image to type int using: np.uint8(dist_transform)
I did something extra as well (you might/might not need it). In order to segment the rooms to a certain extent, I performed threshold on the distance transformed image. I got this as a result:
Related
Predefined: My A4 sheet will always be of white color.
I need to detect A4 sheet from image. I am able to detect rectangles, now the problem is I am getting multiple rectangles from my image. So I extracted the images from the detected rectangle points.
Now I want to match image color with white color.
Using below method to extract image from contours detected :
- (cv::Mat) getPaperAreaFromImage: (std::vector<cv::Point>) square, cv::Mat image
{
// declare used vars
int paperWidth = 210; // in mm, because scale factor is taken into account
int paperHeight = 297; // in mm, because scale factor is taken into account
cv::Point2f imageVertices[4];
float distanceP1P2;
float distanceP1P3;
BOOL isLandscape = true;
int scaleFactor;
cv::Mat paperImage;
cv::Mat paperImageCorrected;
cv::Point2f paperVertices[4];
// sort square corners for further operations
square = sortSquarePointsClockwise( square );
// rearrange to get proper order for getPerspectiveTransform()
imageVertices[0] = square[0];
imageVertices[1] = square[1];
imageVertices[2] = square[3];
imageVertices[3] = square[2];
// get distance between corner points for further operations
distanceP1P2 = distanceBetweenPoints( imageVertices[0], imageVertices[1] );
distanceP1P3 = distanceBetweenPoints( imageVertices[0], imageVertices[2] );
// calc paper, paperVertices; take orientation into account
if ( distanceP1P2 > distanceP1P3 ) {
scaleFactor = ceil( lroundf(distanceP1P2/paperHeight) ); // we always want to scale the image down to maintain the best quality possible
paperImage = cv::Mat( paperWidth*scaleFactor, paperHeight*scaleFactor, CV_8UC3 );
paperVertices[0] = cv::Point( 0, 0 );
paperVertices[1] = cv::Point( paperHeight*scaleFactor, 0 );
paperVertices[2] = cv::Point( 0, paperWidth*scaleFactor );
paperVertices[3] = cv::Point( paperHeight*scaleFactor, paperWidth*scaleFactor );
}
else {
isLandscape = false;
scaleFactor = ceil( lroundf(distanceP1P3/paperHeight) ); // we always want to scale the image down to maintain the best quality possible
paperImage = cv::Mat( paperHeight*scaleFactor, paperWidth*scaleFactor, CV_8UC3 );
paperVertices[0] = cv::Point( 0, 0 );
paperVertices[1] = cv::Point( paperWidth*scaleFactor, 0 );
paperVertices[2] = cv::Point( 0, paperHeight*scaleFactor );
paperVertices[3] = cv::Point( paperWidth*scaleFactor, paperHeight*scaleFactor );
}
cv::Mat warpMatrix = getPerspectiveTransform( imageVertices, paperVertices );
cv::warpPerspective(image, paperImage, warpMatrix, paperImage.size(), cv::INTER_LINEAR, cv::BORDER_CONSTANT );
if (true) {
cv::Rect rect = boundingRect(cv::Mat(square));
cv::rectangle(image, rect.tl(), rect.br(), cv::Scalar(0,255,0), 5, 8, 0);
UIImage *object = [self UIImageFromCVMat:paperImage];
}
// we want portrait output
if ( isLandscape ) {
cv::transpose(paperImage, paperImageCorrected);
cv::flip(paperImageCorrected, paperImageCorrected, 1);
return paperImageCorrected;
}
return paperImage;
}
EDITED: I used below method to get the color from image. But now my problem after converting my original image to cv::mat, when I am cropping there is already transparent grey color over my image. So always I am getting the same color.
Is there any direct way to get original color from cv::mat image?
- (UIColor *)averageColor: (UIImage *) image {
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
unsigned char rgba[4];
CGContextRef context = CGBitmapContextCreate(rgba, 1, 1, 8, 4, colorSpace, kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
CGContextDrawImage(context, CGRectMake(0, 0, 1, 1), image.CGImage);
CGColorSpaceRelease(colorSpace);
CGContextRelease(context);
if(rgba[3] > 0) {
CGFloat alpha = ((CGFloat)rgba[3])/255.0;
CGFloat multiplier = alpha/255.0;
return [UIColor colorWithRed:((CGFloat)rgba[0])*multiplier
green:((CGFloat)rgba[1])*multiplier
blue:((CGFloat)rgba[2])*multiplier
alpha:alpha];
}
else {
return [UIColor colorWithRed:((CGFloat)rgba[0])/255.0
green:((CGFloat)rgba[1])/255.0
blue:((CGFloat)rgba[2])/255.0
alpha:((CGFloat)rgba[3])/255.0];
}
}
EDIT 2 :
Input Image
Getting this output
Need to detect only A4 sheet of white color.
I just resolved it using Google Vision api.
My objective was to calculate the cracks for builder purpose from image so in my case User will be using A4 sheet as reference on the image where crack is, and I will capture the A4 sheet and calculate the size taken by each pixel. Then build will tap on two points in the crack, and I will calculate the distance.
In google vision I used document text detection api and printed my app name on A4 sheet fully covered vertically or horizontally. And google vision api detect that text and gives me the coordinate.
I am doing a background subtraction capture demo recently but I met with difficulties. I have already get the pixel of silhouette extraction and I intend to draw it into a buffer through createGraphics(). I set the new background is 100% transparent so that I could only get the foreground extraction. Then I use saveFrame() function in order to get png file of each frame. However, it doesn't work as I expected. I intend to get a series of png of the silhouette extraction
with 100% transparent background but now I only get the general png of frames from the camera feed. Is there anyone could help me to see what's the problem with this code? Thanks a lot in advance. Any help will be appreciated.
import processing.video.*;
Capture video;
PGraphics pg;
PImage backgroundImage;
float threshold = 30;
void setup() {
size(320, 240);
video = new Capture(this, width, height);
video.start();
backgroundImage = createImage(video.width, video.height, RGB);
pg = createGraphics(320, 240);
}
void captureEvent(Capture video) {
video.read();
}
void draw() {
pg.beginDraw();
loadPixels();
video.loadPixels();
backgroundImage.loadPixels();
image(video, 0, 0);
for (int x = 0; x < video.width; x++) {
for (int y = 0; y < video.height; y++) {
int loc = x + y * video.width;
color fgColor = video.pixels[loc];
color bgColor = backgroundImage.pixels[loc];
float r1 = red(fgColor); float g1 = green(fgColor); float b1 = blue(fgColor);
float r2 = red(bgColor); float g2 = green(bgColor); float b2 = blue(bgColor);
float diff = dist(r1, g1, b1, r2, g2, b2);
if (diff > threshold) {
pixels[loc] = fgColor;
} else {
pixels[loc] = color(0, 0);
}
}}
pg.updatePixels();
pg.endDraw();
saveFrame("line-######.png");
}
void mousePressed() {
backgroundImage.copy(video, 0, 0, video.width, video.height, 0, 0, video.width, video.height);
backgroundImage.updatePixels();
}
Re:
Then I use saveFrame() function in order to get png file of each frame. However, it doesn't work as I expected. I intend to get a series of png of the silhouette extraction with 100% transparent background but now I only get the general png of frames from the camera feed.
This won't work, because saveFrame() saves the canvas, and the canvas doesn't support transparency. For example, from the reference:
It is not possible to use the transparency alpha parameter with background colors on the main drawing surface. It can only be used along with a PGraphics object and createGraphics(). https://processing.org/reference/background_.html
If you want to dump a frame with transparency you need to use .save() to dump it directly from a PImage / PGraphics.
https://processing.org/reference/PImage_save_.html
If you need to clear your PImage / PGraphics and reuse it each frame, either use pg.clear() or pg.background(0,0,0,0) (set all pixels to transparent black).
I just started to get my hands dirty with the Tesseract library, but the results are really really bad.
I followed the instructions in the Git repository ( https://github.com/gali8/Tesseract-OCR-iOS ). My ViewController uses the following method to start recognizing:
Tesseract *t = [[Tesseract alloc] initWithLanguage:#"deu"];
t.delegate = self;
[t setVariableValue:#"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" forKey:#"tessedit_char_whitelist"];
[t setImage:img];
[t recognize];
NSLog( #"Recognized text: %#", [t recognizedText] );
labelRecognizedText.text = [t recognizedText];
t = nil;
The sample image from the project tempalte
works well (which tells me that the project itself is setup correctly), but whenever I try to use other images, the recognized text is a complete mess. For example, I tried to take a picture of my finder displaying the sample image:
https://dl.dropboxusercontent.com/u/607872/tesseract.jpg (1,5 MB)
But Tesseract recognizes:
Recognized text: s f l TO if v Ysssifss f
ssqxizg ss sfzzlj z
s N T IYIOGY Z I l EY s s
k Es ETL ZHE s UEY
z xhks Fsjs Es z VIII c
s I XFTZT c s h V Ijzs
L s sk sisijk J
s f s ssj Jss sssHss H VI
s s H
i s H st xzs
s s k 4 is x2 IV
Illlsiqss sssnsiisfjlisszxiij s
K
Even when the character whitelist only contains numbers, I don't get a result even close to what the image looks like:
Recognized text: 3 74 211
1
1 1 1
3 53 379 1
3 1 33 5 3 2
3 9 73
1 61 2 2
3 1 6 5 212 7
1
4 9 4
1 17
111 11 1 1 11 1 1 1 1
I assume there's something wrong with the way fotos are taken from the iPad mini's camera I currently use, but I can't figure out what and why.
Any hints?
Update #1
In response to Tomas:
I followed the tutorial in your post but encountered several errors along the way...
the UIImage+OpenCV category cannot be used in my ARC project
I cannot import <opencv2/...> in my controllers, auto-completion does not offer it (and therefore [UIImage CVMat] is not defined)
I think there's something wrong with my integration of OpenCV, even though I followed the Hello-tutorial and added the framework. Am I required to build OpenCV on my Mac as well or is it sufficient to just include the framework in my Xcode project?
Since I don't really know what you might consider as "important" at this point (I've already read several posts and tutorials and tried different steps), feel free to ask :)
Update #2
#Tomas: thanks, the ARC-part was essential. My ViewController already has been renamed to .mm. Forget the part about "cannot import opencv2/" since I already included it in my TestApp-Prefix.pch (as stated in the Hello-tutorial).
On to the next challenge ;)
I noticed that when I use images taken with the camera, the bounds for the roi object aren't calculated successfully. I played around with the device orientation and put a UIImage in my view to see the image processing steps, but sometimes (even when the image is correctly aligned) the values are negative because the if-condition in the bounds.size()-for-loop isn't met. The worst case I had: minX/Y and maxX/Y were never touched. Long story short: the line starting with Mat roi = inranged(cv::Rect( throws an exception (assertion failed because the values were < 0 ). I don't know if the number of contours matter, but I assume so because the bigger the images, the more likely the assertion exception is.
To be perfectly honest: I haven't had the time to read OpenCV's documentation and understand what your code does, but as of now, I don't think there's a way around. Seems like, unfortunately for me, my initial task (scan receipt, run OCR, show items in a table) requires more resources (=time) than I thought.
There's nothing wrong in the way your taking the pictures from your iPad per se. But you just can't throw in such a complex image and expect Tesseract to magically determine which text to extract. Take a closer look to the image and you'll notice it has no uniform lightning, it's extremely noisy so it may not be the best sample to start playing with.
In such scenarios it is mandatory to pre process the image in order to provide the tesseract library with something simpler to recognise.
Below find a very naive pre processing example that uses OpenCV (http://www.opencv.org), a popular image processing framework. It should give you and idea to get you started.
#import <TesseractOCR/TesseractOCR.h>
#import <opencv2/opencv.hpp>
#import "UIImage+OpenCV.h"
using namespace cv;
...
// load source image
UIImage *img = [UIImage imageNamed:#"tesseract.jpg"];
Mat mat = [img CVMat];
Mat hsv;
// convert to HSV (better than RGB for this task)
cvtColor(mat, hsv, CV_RGB2HSV_FULL);
// blur is slightly to reduce noise impact
const int blurRadius = img.size.width / 250;
blur(hsv, hsv, cv::Size(blurRadius, blurRadius));
// in range = extract pixels within a specified range
// here we work only on the V channel extracting pixels with 0 < V < 120
Mat inranged;
inRange(hsv, cv::Scalar(0, 0, 0), cv::Scalar(255, 255, 120), inranged);
Mat inrangedforcontours;
inranged.copyTo(inrangedforcontours); // findContours alters src mat
// now find contours to find where characters are approximately located
vector<vector<cv::Point> > contours;
vector<Vec4i> hierarchy;
findContours(inrangedforcontours, contours, hierarchy, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE, cv::Point(0, 0));
int minX = INT_MAX;
int minY = INT_MAX;
int maxX = 0;
int maxY = 0;
// find all contours that match expected character size
for (size_t i = 0; i < contours.size(); i++)
{
cv::Rect brect = cv::boundingRect(contours[i]);
float ratio = (float)brect.height / brect.width;
if (brect.height > 250 && ratio > 1.2 && ratio < 2.0)
{
minX = MIN(minX, brect.x);
minY = MIN(minY, brect.y);
maxX = MAX(maxX, brect.x + brect.width);
maxY = MAX(maxY, brect.y + brect.height);
}
}
// Now we know where our characters are located
// extract relevant part of the image adding a margin that enlarges area
const int margin = img.size.width / 50;
Mat roi = inranged(cv::Rect(minX - margin, minY - margin, maxX - minX + 2 * margin, maxY - minY + 2 * margin));
cvtColor(roi, roi, CV_GRAY2BGRA);
img = [UIImage imageWithCVMat:roi];
Tesseract *t = [[Tesseract alloc] initWithLanguage:#"eng"];
[t setVariableValue:#"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" forKey:#"tessedit_char_whitelist"];
[t setImage:img];
[t recognize];
NSString *recognizedText = [[t recognizedText] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
if ([recognizedText isEqualToString:#"1234567890"])
NSLog(#"Yeah!");
else
NSLog(#"Epic fail...");
Notes
The UIImage+OpenCV category can be found here. If you're under ARC check this.
Take a look at this to get you started with OpenCV in Xcode. Note that OpenCV is a C++ framework which can't be imported in plain C (or Objective-C) source files. The easiest workaround is to rename your view controller from .m to .mm (Objective-C++) and reimport it in your project.
There is different behavior of tesseract result.
It requires good quality of picture means good texture visibility.
Large size picture take much time to process its also good to resize it into small before processing.
It will good to perform some color effect on image before sending it to tesseract. Use effects which could enhance the visibility of image.
There is sometime different behavior of processing photo by using Camera or by Camera Album.
In case of taking photo directly from Camera try below function.
- (UIImage *) getImageForTexture:(UIImage *)src_img{
CGColorSpaceRef d_colorSpace = CGColorSpaceCreateDeviceRGB();
/*
* Note we specify 4 bytes per pixel here even though we ignore the
* alpha value; you can't specify 3 bytes per-pixel.
*/
size_t d_bytesPerRow = src_img.size.width * 4;
unsigned char * imgData = (unsigned char*)malloc(src_img.size.height*d_bytesPerRow);
CGContextRef context = CGBitmapContextCreate(imgData, src_img.size.width,
src_img.size.height,
8, d_bytesPerRow,
d_colorSpace,
kCGImageAlphaNoneSkipFirst);
UIGraphicsPushContext(context);
// These next two lines 'flip' the drawing so it doesn't appear upside-down.
CGContextTranslateCTM(context, 0.0, src_img.size.height);
CGContextScaleCTM(context, 1.0, -1.0);
// Use UIImage's drawInRect: instead of the CGContextDrawImage function, otherwise you'll have issues when the source image is in portrait orientation.
[src_img drawInRect:CGRectMake(0.0, 0.0, src_img.size.width, src_img.size.height)];
UIGraphicsPopContext();
/*
* At this point, we have the raw ARGB pixel data in the imgData buffer, so
* we can perform whatever image processing here.
*/
// After we've processed the raw data, turn it back into a UIImage instance.
CGImageRef new_img = CGBitmapContextCreateImage(context);
UIImage * convertedImage = [[UIImage alloc] initWithCGImage:
new_img];
CGImageRelease(new_img);
CGContextRelease(context);
CGColorSpaceRelease(d_colorSpace);
free(imgData);
return convertedImage;
}
I have been struggling with Tesseract character recognition for weeks. Here are two things I learned to get it to work better...
If you know what font you will be reading, clear the training and retrain it for only that font. Multiple fonts slows the OCR processing down and also increases the ambiguity in the Tesseract decision process. This will lead to greater accuracy and speed.
After OCR processing is really needed. You will end up with a matrix of characters that Tesseract recognizes. You will need to further process the characters to narrow down on what you are trying to read. So for instance, if your application is reading food labels, knowing the rules for the words and sentences that make up the food label will help recognize a series of characters that make up that label.
Convert your UIImage from srgb to rgb format .
if you are using IOS 5.0 and above use
use #import <Accelerate/Accelerate.h>
else uncomment //IOS 3.0-5.0
-(UIImage *) createARGBImageFromRGBAImage: (UIImage*)image
{ //CGSize size = CGSizeMake(320, 480);
CGSize dimensions = CGSizeMake(320, 480);
NSUInteger bytesPerPixel = 4;
NSUInteger bytesPerRow = bytesPerPixel * dimensions.width;
NSUInteger bitsPerComponent = 8;
unsigned char *rgba = malloc(bytesPerPixel * dimensions.width * dimensions.height);
unsigned char *argb = malloc(bytesPerPixel * dimensions.width * dimensions.height);
CGColorSpaceRef colorSpace = NULL;
CGContextRef context = NULL;
colorSpace = CGColorSpaceCreateDeviceRGB();
context = CGBitmapContextCreate(rgba, dimensions.width, dimensions.height, bitsPerComponent, bytesPerRow, colorSpace, kCGImageAlphaPremultipliedLast | kCGBitmapByteOrderDefault); // kCGBitmapByteOrder32Big
CGContextDrawImage(context, CGRectMake(0, 0, dimensions.width, dimensions.height), [image CGImage]);
CGContextRelease(context);
CGColorSpaceRelease(colorSpace);
const vImage_Buffer src = { rgba, dimensions.height, dimensions.width, bytesPerRow };
const vImage_Buffer dis = { rgba, dimensions.height, dimensions.width, bytesPerRow };
const uint8_t map[4] = {3,0,1,2};
vImagePermuteChannels_ARGB8888(&src, &dis, map, kvImageNoFlags);
//IOS 3.0-5.0
/*for (int x = 0; x < dimensions.width; x++) {
for (int y = 0; y < dimensions.height; y++) {
NSUInteger offset = ((dimensions.width * y) + x) * bytesPerPixel;
argb[offset + 0] = rgba[offset + 3];
argb[offset + 1] = rgba[offset + 0];
argb[offset + 2] = rgba[offset + 1];
argb[offset + 3] = rgba[offset + 2];
}
}*/
colorSpace = CGColorSpaceCreateDeviceRGB();
context = CGBitmapContextCreate(dis.data, dimensions.width, dimensions.height, bitsPerComponent, bytesPerRow, colorSpace, kCGImageAlphaPremultipliedFirst | kCGBitmapByteOrderDefault); // kCGBitmapByteOrder32Big
CGImageRef imageRef = CGBitmapContextCreateImage(context);
image = [UIImage imageWithCGImage: imageRef];
CGImageRelease(imageRef);
CGContextRelease(context);
CGColorSpaceRelease(colorSpace);
free(rgba);
free(argb);
return image;
}
Tesseract *t = [[Tesseract alloc] initWithLanguage:#"eng"];
[t setVariableValue:#"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" forKey:#"tessedit_char_whitelist"];
[t setImage:[self createARGBImageFromRGBAImage:img]];
[t recognize];
The swift equivalent of #FARAZ's answer
func getImageForTexture(srcImage: UIImage) -> UIImage{
let d_colorSpace = CGColorSpaceCreateDeviceRGB()
let d_bytesPerRow: size_t = Int(srcImage.size.width) * 4
/*
* Note we specify 4 bytes per pixel here even though we ignore the
* alpha value; you can't specify 3 bytes per-pixel.
*/
let imgData = malloc(Int(srcImage.size.height) * Int(d_bytesPerRow))
let context = CGBitmapContextCreate(imgData, Int(srcImage.size.width), Int(srcImage.size.height), 8, Int(d_bytesPerRow), d_colorSpace,CGImageAlphaInfo.NoneSkipFirst.rawValue)
UIGraphicsPushContext(context!)
// These next two lines 'flip' the drawing so it doesn't appear upside-down.
CGContextTranslateCTM(context, 0.0, srcImage.size.height)
CGContextScaleCTM(context, 1.0, -1.0)
// Use UIImage's drawInRect: instead of the CGContextDrawImage function, otherwise you'll
srcImage.drawInRect(CGRectMake(0.0, 0.0, srcImage.size.width, srcImage.size.height))
UIGraphicsPopContext()
/*
* At this point, we have the raw ARGB pixel data in the imgData buffer, so
* we can perform whatever image processing here.
*/
// After we've processed the raw data, turn it back into a UIImage instance.
let new_img = CGBitmapContextCreateImage(context)
let convertedImage = UIImage(CGImage: new_img!)
return convertedImage
}
Currently I have the following:
[cameraFeed] -> [gaussianBlur] -> [sobel] -> [luminanceThreshold] -> [rawDataOutput]
the rawDataOuput I would like to pass it to the OpenCV findCountours function. Unfortunately, I can't figure out the right way to do this. The following is the callback block that gets the rawDataOuput and that should pass it to the OpenCV function but it does not work. I am assuming there are a few things involved such as converting the BGRA image given by GPUImage to CV_8UC1 (single channel) but I am not able to figure them out. Some help would be much appreciated, thanks!
// Callback on raw data output
__weak GPUImageRawDataOutput *weakOutput = rawDataOutput;
[rawDataOutput setNewFrameAvailableBlock:^{
[weakOutput lockFramebufferForReading];
GLubyte *outputBytes = [weakOutput rawBytesForImage];
NSInteger bytesPerRow = [weakOutput bytesPerRowInOutput];
// OpenCV stuff
int width = videoSize.width;
int height = videoSize.height;
size_t step = bytesPerRow;
cv::Mat mat(height, width, CV_8UC1, outputBytes, step); // outputBytes should be converted to type CV_8UC1
cv::Mat workingCopy = mat.clone();
// PASS mat TO OPENCV FUNCTION!!!
[weakOutput unlockFramebufferAfterReading];
// Update rawDataInput if we want to display the result
[rawDataInput updateDataFromBytes:outputBytes size:videoSize];
[rawDataInput processData];
}];
Try change CV_8UC1 to CV_8UC4 and then convert to gray.
In code replace lines
cv::Mat mat(height, width, CV_8UC1, outputBytes, step);
cv::Mat workingCopy = mat.clone();
with
cv::Mat mat(height, width, CV_8UC4, outputBytes, step);
cv::Mat workingCopy;
cv::cvtColor(mat, workingCopy, CV_RGBA2GRAY);
I found a pixel perfect collision algorithm developed by Daniel Vilchez and included in a project shared in this cocos2d-iphone.org forum topic.
Below there is the part of the algorithm I am interested. I am trying to modify this because whenever I used CCRenderTexture, as originally in the code, the App crashed.
I am thinking of alternative methods based on circle collision but those are "not pixel perfect" and in the case my bullet is a wave with this shape it wouldn't work well.
**I am wondering how can I get the algorithm working with sprites batched in a CCSpriteBatchNode? And if so does this strictly include the usage of CCRenderTexture? **
To be precise, this question is partially related to this other question of mine, on creating an instance of CCRenderTexture that causes my App to crash. I post two different ones because here I am asking about the algorithm, in the other one I just ask why CCRenderTexture causes my App to crash (without using Daniel's pixel perfect algorithm, but just creating an instance of CCRenderTexture).
Adapted CODE (here is missing CCRenderTexture because it made my app crashing, so I commented out the usage of _rt - instance of CCRenderTexture). The code does not work properly, so I guess I need CCRenderTexture and hence I asked the question:
-(BOOL) isPixelPerfectCollisionBetweenSpriteA:(CCSprite*)spr1 spriteB:(CCSprite*) spr2
{
BOOL isCollision = NO;
CGRect intersection = CGRectIntersection([spr1 boundingBox], [spr2 boundingBox]);
// Look for simple bounding box collision
if (!CGRectIsEmpty(intersection))
{
// Get intersection info
unsigned int x = intersection.origin.x;
unsigned int y = intersection.origin.y;
unsigned int w = intersection.size.width;
unsigned int h = intersection.size.height;
unsigned int numPixels = w * h;
//NSLog(#"\nintersection = (%u,%u,%u,%u), area = %u",x,y,w,h,numPixels);
// Draw into the RenderTexture
//[_rt beginWithClear:0 g:0 b:0 a:0];
// Render both sprites: first one in RED and second one in GREEN
glColorMask(1, 0, 0, 1);
[spr1 visit];
glColorMask(0, 1, 0, 1);
[spr2 visit];
glColorMask(1, 1, 1, 1);
// Get color values of intersection area
ccColor4B *buffer = malloc( sizeof(ccColor4B) * numPixels );
glReadPixels(x, y, w, h, GL_RGBA, GL_UNSIGNED_BYTE, buffer);
//[_rt end];
// Read buffer
unsigned int step = 1;
for(unsigned int i=0; i<numPixels; i+=step)
{
ccColor4B color = buffer[i];
if (color.r > 0 && color.g > 0)
{
isCollision = YES;
break;
}
}
// Free buffer memory
free(buffer);
}
return isCollision;
EDIT: I found also KKPixelMaskSprite but it doesn't seem to work for high resolution sprites batched in CCSpriteBatchNodes (see comment here).