Swift 3 - How do I improve image quality for Tesseract? - ios

I am using Swift 3 to build a mobile app that allows the user to take a picture and run Tesseract OCR over the resulting image.
However, I've been trying to increase the quality of scan and it doesn't seem to be working much. I've segmented the photo into a more "zoomed in" region that I want to recognize and even tried making it black and white. Are there any strategies for "enhancing" or optimizing the picture quality/size so that Tesseract can recognize it better? Thanks!
tesseract.image = // the camera photo here
tesseract.recognize()
print(tesseract.recognizedText)
I got these errors and have no idea what to do:
Error in pixCreateHeader: depth must be {1, 2, 4, 8, 16, 24, 32}
Error in pixCreateNoInit: pixd not made
Error in pixCreate: pixd not made
Error in pixGetData: pix not defined
Error in pixGetWpl: pix not defined
2017-03-11 22:22:30.019717 ProjectName[34247:8754102] Cannot convert image to Pix with bpp = 64
Error in pixSetYRes: pix not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined
Please call SetImage before attempting recognition.Please call SetImage before attempting recognition.2017-03-11 22:22:30.026605 EOB-Reader[34247:8754102] No recognized text. Check that -[Tesseract setImage:] is passed an image bigger than 0x0.

ive been using tesseract fairly successfully in swift 3 using the following:
func performImageRecognition(_ image: UIImage) {
let tesseract = G8Tesseract(language: "eng")
var textFromImage: String?
tesseract?.engineMode = .tesseractCubeCombined
tesseract?.pageSegmentationMode = .singleBlock
tesseract?.image = imageView.image
tesseract?.recognize()
textFromImage = tesseract?.recognizedText
print(textFromImage!)
}
I also found pre-processing the image helped too. I added the following extension to UIImage
import UIKit
import CoreImage
extension UIImage {
func toGrayScale() -> UIImage {
let greyImage = UIImageView()
greyImage.image = self
let context = CIContext(options: nil)
let currentFilter = CIFilter(name: "CIPhotoEffectNoir")
currentFilter!.setValue(CIImage(image: greyImage.image!), forKey: kCIInputImageKey)
let output = currentFilter!.outputImage
let cgimg = context.createCGImage(output!,from: output!.extent)
let processedImage = UIImage(cgImage: cgimg!)
greyImage.image = processedImage
return greyImage.image!
}
func binarise() -> UIImage {
let glContext = EAGLContext(api: .openGLES2)!
let ciContext = CIContext(eaglContext: glContext, options: [kCIContextOutputColorSpace : NSNull()])
let filter = CIFilter(name: "CIPhotoEffectMono")
filter!.setValue(CIImage(image: self), forKey: "inputImage")
let outputImage = filter!.outputImage
let cgimg = ciContext.createCGImage(outputImage!, from: (outputImage?.extent)!)
return UIImage(cgImage: cgimg!)
}
func scaleImage() -> UIImage {
let maxDimension: CGFloat = 640
var scaledSize = CGSize(width: maxDimension, height: maxDimension)
var scaleFactor: CGFloat
if self.size.width > self.size.height {
scaleFactor = self.size.height / self.size.width
scaledSize.width = maxDimension
scaledSize.height = scaledSize.width * scaleFactor
} else {
scaleFactor = self.size.width / self.size.height
scaledSize.height = maxDimension
scaledSize.width = scaledSize.height * scaleFactor
}
UIGraphicsBeginImageContext(scaledSize)
self.draw(in: CGRect(x: 0, y: 0, width: scaledSize.width, height: scaledSize.height))
let scaledImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return scaledImage!
}
func orientate(img: UIImage) -> UIImage {
if (img.imageOrientation == UIImageOrientation.up) {
return img;
}
UIGraphicsBeginImageContextWithOptions(img.size, false, img.scale)
let rect = CGRect(x: 0, y: 0, width: img.size.width, height: img.size.height)
img.draw(in: rect)
let normalizedImage : UIImage = UIGraphicsGetImageFromCurrentImageContext()!
UIGraphicsEndImageContext()
return normalizedImage
}
}
And then called this before passing the image to performImageRecognition
func processImage() {
self.imageView.image! = self.imageView.image!.toGrayScale()
self.imageView.image! = self.imageView.image!.binarise()
self.imageView.image! = self.imageView.image!.scaleImage()
}
Hope this helps

Related

My custom metal image filter is slow. How can I make it faster?

I've seen a lot of other's online tutorial that are able to achieve 0.0X seconds mark on filtering an image. Meanwhile my code here took 1.09 seconds to filter an image.(Just to reduce brightness by half).
edit after first comment
time measured with 2 methods
Date() timeinterval , when the button “apply filter” tapped and after the apply filter function is done running
build it on iphone and count manually with my timer on my watch
Since I'm new to metal & kernel stuff, I don't really know the difference between my code and those tutorials that achieve faster result. Which part of my code can be improved/ use different approach to make it a lot faster.
here's my kernel code
#include <metal_stdlib>
using namespace metal;
kernel void black(
texture2d<float, access::write> outTexture [[texture(0)]],
texture2d<float, access::read> inTexture [[texture(1)]],
uint2 id [[thread_position_in_grid]]) {
float3 val = inTexture.read(id).rgb;
float r = val.r / 4;
float g = val.g / 4;
float b = val.b / 2;
float4 out = float4(r, g, b, 1.0);
outTexture.write(out.rgba, id);
}
this is my swift code
import Metal
import MetalKit
// UIImage -> CGImage -> MTLTexture -> COMPUTE HAPPENS |
// UIImage <- CGImage <- MTLTexture <--
class Filter {
var device: MTLDevice
var defaultLib: MTLLibrary?
var grayscaleShader: MTLFunction?
var commandQueue: MTLCommandQueue?
var commandBuffer: MTLCommandBuffer?
var commandEncoder: MTLComputeCommandEncoder?
var pipelineState: MTLComputePipelineState?
var inputImage: UIImage
var height, width: Int
// most devices have a limit of 512 threads per group
let threadsPerBlock = MTLSize(width: 32, height: 32, depth: 1)
init(){
print("initialized")
self.device = MTLCreateSystemDefaultDevice()!
print(device)
//changes: I did do catch try, and use bundle parameter when making make default library
let frameworkBundle = Bundle(for: type(of: self))
print(frameworkBundle)
self.defaultLib = device.makeDefaultLibrary()
self.grayscaleShader = defaultLib?.makeFunction(name: "black")
self.commandQueue = self.device.makeCommandQueue()
self.commandBuffer = self.commandQueue?.makeCommandBuffer()
self.commandEncoder = self.commandBuffer?.makeComputeCommandEncoder()
//ERROR HERE
if let shader = grayscaleShader {
print("in")
self.pipelineState = try? self.device.makeComputePipelineState(function: shader)
} else { fatalError("unable to make compute pipeline") }
self.inputImage = UIImage(named: "stockImage")!
self.height = Int(self.inputImage.size.height)
self.width = Int(self.inputImage.size.width)
}
func getCGImage(from uiimg: UIImage) -> CGImage? {
UIGraphicsBeginImageContext(uiimg.size)
uiimg.draw(in: CGRect(origin: .zero, size: uiimg.size))
let contextImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return contextImage?.cgImage
}
func getMTLTexture(from cgimg: CGImage) -> MTLTexture {
let textureLoader = MTKTextureLoader(device: self.device)
do{
let texture = try textureLoader.newTexture(cgImage: cgimg, options: nil)
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: texture.pixelFormat, width: width, height: height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
return texture
} catch {
fatalError("Couldn't convert CGImage to MTLtexture")
}
}
func getCGImage(from mtlTexture: MTLTexture) -> CGImage? {
var data = Array<UInt8>(repeatElement(0, count: 4*width*height))
mtlTexture.getBytes(&data,
bytesPerRow: 4*width,
from: MTLRegionMake2D(0, 0, width, height),
mipmapLevel: 0)
let bitmapInfo = CGBitmapInfo(rawValue: (CGBitmapInfo.byteOrder32Big.rawValue | CGImageAlphaInfo.premultipliedLast.rawValue))
let colorSpace = CGColorSpaceCreateDeviceRGB()
let context = CGContext(data: &data,
width: width,
height: height,
bitsPerComponent: 8,
bytesPerRow: 4*width,
space: colorSpace,
bitmapInfo: bitmapInfo.rawValue)
return context?.makeImage()
}
func getUIImage(from cgimg: CGImage) -> UIImage? {
return UIImage(cgImage: cgimg)
}
func getEmptyMTLTexture() -> MTLTexture? {
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: MTLPixelFormat.rgba8Unorm,
width: width,
height: height,
mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
return self.device.makeTexture(descriptor: textureDescriptor)
}
func getInputMTLTexture() -> MTLTexture? {
if let inputImage = getCGImage(from: self.inputImage) {
return getMTLTexture(from: inputImage)
}
else { fatalError("Unable to convert Input image to MTLTexture") }
}
func getBlockDimensions() -> MTLSize {
let blockWidth = width / self.threadsPerBlock.width
let blockHeight = height / self.threadsPerBlock.height
return MTLSizeMake(blockWidth, blockHeight, 1)
}
func applyFilter() -> UIImage? {
print("start")
let date = Date()
print(date)
if let encoder = self.commandEncoder, let buffer = self.commandBuffer,
let outputTexture = getEmptyMTLTexture(), let inputTexture = getInputMTLTexture() {
encoder.setTextures([outputTexture, inputTexture], range: 0..<2)
encoder.setComputePipelineState(self.pipelineState!)
encoder.dispatchThreadgroups(self.getBlockDimensions(), threadsPerThreadgroup: threadsPerBlock)
encoder.endEncoding()
buffer.commit()
buffer.waitUntilCompleted()
guard let outputImage = getCGImage(from: outputTexture) else { fatalError("Couldn't obtain CGImage from MTLTexture") }
print("stop")
let date2 = Date()
print(date2.timeIntervalSince(date))
return getUIImage(from: outputImage)
} else { fatalError("optional unwrapping failed") }
}
}
In case someone still need the answer, I found a different approach which is make it as custom CIFilter. It works pretty fast and super easy to undestand!
You using UIImage, CGImage. These objects stored in CPU memory.
Need implement code with using just CIImage or MTLTexture.
These object are storing in GPU memory and have best performace.

Pixellating a UIImage returns UIImage with a different size

I'm using an extension to pixellate my images like the following:
func pixellated(scale: Int = 8) -> UIImage? {
guard let ciImage = CIImage(image: self), let filter = CIFilter(name: "CIPixellate") else { return nil }
filter.setValue(ciImage, forKey: kCIInputImageKey)
filter.setValue(scale, forKey: kCIInputScaleKey)
guard let output = filter.outputImage else { return nil }
return UIImage(ciImage: output)
}
The problem is the image represented by self here has not the same size than the one I create using UIImage(ciImage: output).
For example, using that code:
print("image.size BEFORE : \(image.size)")
if let imagePixellated = image.pixellated(scale: 48) {
image = imagePixellated
print("image.size AFTER : \(image.size)")
}
will print:
image.size BEFORE : (400.0, 298.0)
image.size AFTER : (848.0, 644.0)
Not the same size and not the same ratio.
Any idea why?
EDIT:
I added some prints in the extension as following:
func pixellated(scale: Int = 8) -> UIImage? {
guard let ciImage = CIImage(image: self), let filter = CIFilter(name: "CIPixellate") else { return nil }
print("UIIMAGE : \(self.size)")
print("ciImage.extent.size : \(ciImage.extent.size)")
filter.setValue(ciImage, forKey: kCIInputImageKey)
filter.setValue(scale, forKey: kCIInputScaleKey)
guard let output = filter.outputImage else { return nil }
print("output : \(output.extent.size)")
return UIImage(ciImage: output)
}
And here are the outputs:
UIIMAGE : (250.0, 166.5)
ciImage.extent.size : (500.0, 333.0)
output : (548.0, 381.0)
You have two problems:
self.size is measured in points. self's size in pixels is actually self.size multiplied by self.scale.
The CIPixellate filter changes the bounds of its image.
To fix problem one, you can simply set the scale property of the returned UIImage to be the same as self.scale:
return UIImage(ciImage: output, scale: self.scale, orientation: imageOrientation)
But you'll find this still isn't quite right. That's because of problem two. For problem two, the simplest solution is to crop the output CIImage:
// Must use self.scale, to disambiguate from the scale parameter
let floatScale = CGFloat(self.scale)
let pixelSize = CGSize(width: size.width * floatScale, height: size.height * floatScale)
let cropRect = CGRect(origin: CGPoint.zero, size: pixelSize)
guard let output = filter.outputImage?.cropping(to: cropRect) else { return nil }
This will give you an image of the size you want.
Now, your next question may be, "why is there a thin, dark border around my pixellated images?" Good question! But ask a new question for that.

Scaling Images: how can the accelerate be the slowest method?

I am testing several methods to rescale a UIImage.
I have tested all these methods posted here and measured the time they take to resize an image.
1) UIGraphicsBeginImageContextWithOptions & UIImage -drawInRect:
let image = UIImage(contentsOfFile: self.URL.path!)
let size = CGSizeApplyAffineTransform(image.size, CGAffineTransformMakeScale(0.5, 0.5))
let hasAlpha = false
let scale: CGFloat = 0.0 // Automatically use scale factor of main screen
UIGraphicsBeginImageContextWithOptions(size, !hasAlpha, scale)
image.drawInRect(CGRect(origin: CGPointZero, size: size))
let scaledImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
2) CGBitmapContextCreate & CGContextDrawImage
let cgImage = UIImage(contentsOfFile: self.URL.path!).CGImage
let width = CGImageGetWidth(cgImage) / 2
let height = CGImageGetHeight(cgImage) / 2
let bitsPerComponent = CGImageGetBitsPerComponent(cgImage)
let bytesPerRow = CGImageGetBytesPerRow(cgImage)
let colorSpace = CGImageGetColorSpace(cgImage)
let bitmapInfo = CGImageGetBitmapInfo(cgImage)
let context = CGBitmapContextCreate(nil, width, height, bitsPerComponent, bytesPerRow, colorSpace, bitmapInfo.rawValue)
CGContextSetInterpolationQuality(context, kCGInterpolationHigh)
CGContextDrawImage(context, CGRect(origin: CGPointZero, size: CGSize(width: CGFloat(width), height: CGFloat(height))), cgImage)
let scaledImage = CGBitmapContextCreateImage(context).flatMap { UIImage(CGImage: $0) }
3) CGImageSourceCreateThumbnailAtIndex
import ImageIO
if let imageSource = CGImageSourceCreateWithURL(self.URL, nil) {
let options: [NSString: NSObject] = [
kCGImageSourceThumbnailMaxPixelSize: max(size.width, size.height) / 2.0,
kCGImageSourceCreateThumbnailFromImageAlways: true
]
let scaledImage = CGImageSourceCreateThumbnailAtIndex(imageSource, 0, options).flatMap { UIImage(CGImage: $0) }
}
4) Lanczos Resampling with Core Image
let image = CIImage(contentsOfURL: self.URL)
let filter = CIFilter(name: "CILanczosScaleTransform")!
filter.setValue(image, forKey: "inputImage")
filter.setValue(0.5, forKey: "inputScale")
filter.setValue(1.0, forKey: "inputAspectRatio")
let outputImage = filter.valueForKey("outputImage") as! CIImage
let context = CIContext(options: [kCIContextUseSoftwareRenderer: false])
let scaledImage = UIImage(CGImage: self.context.createCGImage(outputImage, fromRect: outputImage.extent()))
5) vImage in Accelerate
let cgImage = UIImage(contentsOfFile: self.URL.path!).CGImage
// create a source buffer
var format = vImage_CGImageFormat(bitsPerComponent: 8, bitsPerPixel: 32, colorSpace: nil,
bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.First.rawValue),
version: 0, decode: nil, renderingIntent: CGColorRenderingIntent.RenderingIntentDefault)
var sourceBuffer = vImage_Buffer()
defer {
sourceBuffer.data.dealloc(Int(sourceBuffer.height) * Int(sourceBuffer.height) * 4)
}
var error = vImageBuffer_InitWithCGImage(&sourceBuffer, &format, nil, cgImage, numericCast(kvImageNoFlags))
guard error == kvImageNoError else { return nil }
// create a destination buffer
let scale = UIScreen.mainScreen().scale
let destWidth = Int(image.size.width * 0.5 * scale)
let destHeight = Int(image.size.height * 0.5 * scale)
let bytesPerPixel = CGImageGetBitsPerPixel(image.CGImage) / 8
let destBytesPerRow = destWidth * bytesPerPixel
let destData = UnsafeMutablePointer<UInt8>.alloc(destHeight * destBytesPerRow)
defer {
destData.dealloc(destHeight * destBytesPerRow)
}
var destBuffer = vImage_Buffer(data: destData, height: vImagePixelCount(destHeight), width: vImagePixelCount(destWidth), rowBytes: destBytesPerRow)
// scale the image
error = vImageScale_ARGB8888(&sourceBuffer, &destBuffer, nil, numericCast(kvImageHighQualityResampling))
guard error == kvImageNoError else { return nil }
// create a CGImage from vImage_Buffer
let destCGImage = vImageCreateCGImageFromBuffer(&destBuffer, &format, nil, nil, numericCast(kvImageNoFlags), &error)?.takeRetainedValue()
guard error == kvImageNoError else { return nil }
// create a UIImage
let scaledImage = destCGImage.flatMap { UIImage(CGImage: $0, scale: 0.0, orientation: image.imageOrientation) }
After testing this for hours and measure the time every method took for rescaling the images to 100x100, my conclusions are completely different from NSHipster. First of all the vImage in accelerate is 200 times slower than the first method, that in my opinion is the poor cousin of the other ones. The core image method is also slow. But I am intrigued how method #1 can smash methods 3, 4 and 5, some of them in theory process stuff on the GPU.
Method #3 for example, took 2 seconds to resize a 1024x1024 image to 100x100. On the other hand #1 took 0.01 seconds!
Am I missing something?
Something must be wrong or Apple would not take time to write accelerate and CIImage stuff.
NOTE: I am measuring the time from the time the image is already loaded on a variable to the time a scaled version is saved to another variable. I am not considering the time it takes to read from the file.
Accelerate can be the slowest method for a variety of reasons:
The code you show may spend a lot of time just extracting the data
from the CGImage and making a new image. You didn't, for example,
use any features that would allow the CGImage to use your vImage result
directly rather than make a copy. Possibly a colorspace conversion was also required as part of some of those extract / create CGImage operations. Hard to tell from here.
Some of the other methods may not have done anything, deferring the
work until later when absolutely forced to do it. If that was after your end time, then the work wasn't measured.
Some of the other methods have the advantage of being able to
directly use the contents of the image without having to make a copy
first.
Different resampling methods (e.g. Bilinear vs. Lanczos) have
different cost
The GPU can actually be faster at some stuff, and resampling is one
of the tasks it is specially optimized to do. On the flip side, random data access (such as occurs in resampling) is not a nice thing to do to the vector unit.
Timing methods can impact the result. Accelerate is multithreaded.
If you use wall clock time, you will get one answer. If you use
getrusage or a sampler, you'll get another.
If you really think Accelerate is way off the mark here, file a bug. I certainly would check with Instruments Time Profile that you are spending the majority of your time in vImageScale in your benchmark loop before doing so, though.

CGImageCreateWithImageInRect() returning nil

I'm trying to crop an image into a square, but once I actually try to do the crop by using CGImageCreateWithImageInRect(), this line crashes. I set breakpoints and made sure that the arguments passed into this function are not nil.
I'm fairly new to programming and Swift, but have searched around and haven't found any solution to my problem.
The failure reason:
fatal error: unexpectedly found nil while unwrapping an Optional value
func cropImageToSquare(imageData: NSData) -> NSData {
let image = UIImage(data: imageData)
let contextImage : UIImage = UIImage(CGImage: image!.CGImage!)
let contextSize: CGSize = contextImage.size
let imageDimension: CGFloat = contextSize.height
let posY : CGFloat = (contextSize.height + (contextSize.width - contextSize.height)/2)
let rect: CGRect = CGRectMake(0, posY, imageDimension, imageDimension)
// error on line below: fatal error: unexpectedly found nil while unwrapping an Optional value
let imageRef: CGImageRef = CGImageCreateWithImageInRect(contextImage.CGImage, rect)!
let croppedImage : UIImage = UIImage(CGImage: imageRef, scale: 1.0, orientation: image!.imageOrientation)
let croppedImageData = UIImageJPEGRepresentation(croppedImage, 1.0)
return croppedImageData!
}
Your code uses a lot of force-unwrapping with !s. I would recommend avoiding this — the compiler is trying to help you write code that won't crash. Use optional chaining with ?, and if let / guard let, instead.
The ! on that particular line is hiding an issue where CGImageCreateWithImageInRect might return nil. The documentation explains that this happens when the rect is not correctly inside the image bounds. Your code works for images in portrait orientation, but not landscape.
Furthermore, there's a convenient function provided by AVFoundation which can automatically find the right rectangle for you to use, called AVMakeRectWithAspectRatioInsideRect. No need to do the calculations manually :-)
Here's what I would recommend:
import AVFoundation
extension UIImage
{
func croppedToSquare() -> UIImage
{
guard let cgImage = self.CGImage else { return self }
// Note: self.size depends on self.imageOrientation, so we use CGImageGetWidth/Height here.
let boundingRect = CGRect(
x: 0, y: 0,
width: CGImageGetWidth(cgImage),
height: CGImageGetHeight(cgImage))
// Crop to square (1:1 aspect ratio) and round the resulting rectangle to integer coordinates.
var cropRect = AVMakeRectWithAspectRatioInsideRect(CGSize(width: 1, height: 1), boundingRect)
cropRect.origin.x = ceil(cropRect.origin.x)
cropRect.origin.y = ceil(cropRect.origin.y)
cropRect.size.width = floor(cropRect.size.width)
cropRect.size.height = floor(cropRect.size.height)
guard let croppedImage = CGImageCreateWithImageInRect(cgImage, cropRect) else {
assertionFailure("cropRect \(cropRect) was not inside \(boundingRect)")
return self
}
return UIImage(CGImage: croppedImage, scale: self.scale, orientation: self.imageOrientation)
}
}
// then:
let croppedImage = myUIImage.croppedToSquare()

Tesseract OCR w/ iOS & Swift returns error or gibberish

I used this tutorial to get Tesseract OCR working with Swift: http://www.piterwilson.com/blog/2014/10/18/minimal-tesseact-ocr-setup-in-swift/
It works fine if I upload the demo image and call
tesseract.image = UIImage(named: "image_sample.jpg");
But if I use my camera code and take a picture of that same image and call
tesseract.image = self.image.blackAndWhite();
the result is either gibberish like
s I 5E251 :Ec
‘-. —7.//:E*髧
a g :_{:7 IC‘
J 7 iii—1553‘
: fizzle —‘;-—:
; ~:~./: -:-‘-
‘- :~£:': _-'~‘:
: 37%; §:‘—_
: ::::E 7,;.
1f:,:~ ——,
Or it returns a BAD_EXC_ACCESS error. I haven't been able to reproduce the reasoning behind why it gives the error or the gibberish. This is the code of my camera capture (photo taken()) and the processing step (nextStepTapped()):
#IBAction func photoTaken(sender: UIButton) {
var videoConnection = stillImageOutput.connectionWithMediaType(AVMediaTypeVideo)
if videoConnection != nil {
// Show next step button
self.view.bringSubviewToFront(self.nextStep)
self.nextStep.hidden = false
// Secure image
stillImageOutput.captureStillImageAsynchronouslyFromConnection(videoConnection) {
(imageDataSampleBuffer, error) -> Void in
var imageData = AVCaptureStillImageOutput.jpegStillImageNSDataRepresentation(imageDataSampleBuffer)
self.image = UIImage(data: imageData)
//var dataProvider = CGDataProviderCreateWithCFData(imageData)
//var cgImageRef = CGImageCreateWithJPEGDataProvider(dataProvider, nil, true, kCGRenderingIntentDefault)
//self.image = UIImage(CGImage: cgImageRef, scale: 1.0, orientation: UIImageOrientation.Right)
}
// Freeze camera preview
captureSession.stopRunning()
}
}
#IBAction func nextStepTapped(sender: UIButton) {
// Save to camera roll & proceeed
//UIImageWriteToSavedPhotosAlbum(self.image.blackAndWhite(), nil, nil, nil)
//UIImageWriteToSavedPhotosAlbum(self.image, nil, nil, nil)
// OCR
var tesseract:Tesseract = Tesseract();
tesseract.language = "eng";
tesseract.delegate = self;
tesseract.image = self.image.blackAndWhite();
tesseract.recognize();
NSLog("%#", tesseract.recognizedText);
}
The image saves to the Camera Roll and is completely legible if I uncomment the commented lines. Not sure why it won't work. It has no problem reading the text on the image if it's uploaded directly into Xcode as a supporting file, but if I take a picture of the exact same image on my screen then it can't read it.
Stumbled upon this tutorial: http://www.raywenderlich.com/93276/implementing-tesseract-ocr-ios
It happened to mention scaling the image. They chose the max dimension as 640. I was taking my pictures as 640x480, so I figured I didn't need to scale them, but I think this code essentially redraws the image. For some reason now my photos OCR fairly well. I still need to work on image processing for smaller text, but it works perfectly for large text. Run my image through this scaling function and I'm good to go.
func scaleImage(image: UIImage, maxDimension: CGFloat) -> UIImage {
var scaledSize = CGSize(width: maxDimension, height: maxDimension)
var scaleFactor: CGFloat
if image.size.width > image.size.height {
scaleFactor = image.size.height / image.size.width
scaledSize.width = maxDimension
scaledSize.height = scaledSize.width * scaleFactor
} else {
scaleFactor = image.size.width / image.size.height
scaledSize.height = maxDimension
scaledSize.width = scaledSize.height * scaleFactor
}
UIGraphicsBeginImageContext(scaledSize)
image.drawInRect(CGRectMake(0, 0, scaledSize.width, scaledSize.height))
let scaledImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return scaledImage
}

Resources