I am using Swift 3 and developing an application where the user takes a photo and uses Tesseract OCR to recognize the text in it.
The following code block works.
func processPhoto() {
if let tesseract = G8Tesseract(language: "eng") {
tesseract.delegate = self
// this is the resulting picture gotten after running the capture delegate
tesseract.image = stillPicture.image!
tesseract.recognize()
}
}
However, if I try to manipulate the picture at all (stillPicture.image!), I get the following error:
Error in pixCreateHeader: depth must be {1, 2, 4, 8, 16, 24, 32}
Error in pixCreateNoInit: pixd not made
Error in pixCreate: pixd not made
Error in pixGetData: pix not defined
Error in pixGetWpl: pix not defined
2017-03-13 11:13:05.336796 ProjectName[35238:9127211] Cannot convert image to Pix with bpp = 64
Error in pixSetYRes: pix not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined
Please call SetImage before attempting recognition.Please call SetImage before attempting recognition.2017-03-13 11:13:05.343568 EOB-Reader[35238:9127211] No recognized text. Check that -[Tesseract setImage:] is passed an image bigger than 0x0.
Some things that I do to manipulate the picture is to rotate it:
// Rotate taken picture
let orig_image = stillPicture.image!
let new_image_canvas = UIGraphicsImageRenderer(size: CGSize(width: stillPicture.image!.size.height,
height: stillPicture.image!.size.width))
let new_image = new_image_canvas.image { _ in
let curr_context = UIGraphicsGetCurrentContext()!
curr_context.translateBy(x: 0, y: stillPicture.image!.size.width)
curr_context.rotate(by: -.pi/2)
stillPicture.image!.draw(at: .zero)
}
tesseract.image = new_image
If I do that, BOOM! The error above occurs.
Another manipulation I do is to crop portion of the image.
let finalImage : UIImage
let crop_section = CGRect(x: 590.0, y: 280.0, width: 950.0, height: 550.0)
let cg_image = stillPicture.image!.cgImage?.cropping(to: crop_section)
finalImage = UIImage(cgImage: cg_image!)
tesseract.image = final_image
Again, BOOM! Error appears. Any idea why this is happening and why my image manipulations are causing problems? Thanks for your help!
Whatever transformations you make to the image leave it in a format which Tesseract cannot understand. Tesseract uses the Leptonica library to handle image formats, and Leptonica can understand only images in a certain format.
The first line:
Error in pixCreateHeader: depth must be {1, 2, 4, 8, 16, 24, 32}
already is a big hint on what the error is. Bit depth means how many bits per pixel you have. For example a 24-bit image is usually RGB - you have 8-bits (or one byte) for red, green and blue each - total of 24-bits. 32-bits is for ARGB (RGB+alpha channel). 1 bit is monochrome.
See http://search.cpan.org/dist/Image-Leptonica/lib/Image/Leptonica/Func/pix1.pm#pixCreateHeader - pixCreateHeader is a leptopnica function.
So try the following - save the image to a file, and open it in some image editor and check what type of image it is, esp. the bit depth.
Apparently your image is using some weird bit depth. Also look at Node.js 20x slower than browser (Safari) with Tesseract.Js, because this is the only question that I could find where Error in pixCreateHeader: depth must be {1, 2, 4, 8, 16, 24, 32} is also mentioned.
Related
I am trying to load images into an iOS application that are downloaded from a server. I have encountered an issue where a thin black line is appearing on the far right of the image when it's loaded using UIImage(contentsOf:) or an equivalent UIImage initialiser. I initially thought this was an error in my URLSession code, but I have since been able to reproduce this locally with a file that I've created in Affinity Designer.
The image is a little odd - it uses the "Grey" color model, and the "Generic Grey Gamma 2.2 Profile". It also has a resolution of 319 x 350. If I switch to an RGB profile the problem appears to go away, and if I change the dimensions, the problem also goes away. This can occur with images I've created with this configuration, as well as the source image from the server.
The source image is solid white, however once I load it as a UIImage, the column of pixels on the far right has a thin black line that has appeared out of nowhere. I also tried CIImage and loading directly as a CGImage, and the same issue occurs, but I imagine it's all using the same underlying image IO code?
Bare in mind the UIImage attached here is via the quick look feature in Xcode, but I checked for the line of black pixels in code and they are there.
I can only seem to reproduce this on my physical device, whereas the simulator works as expected.
This is the block of code I'm using to load the source image above from stack overflow, and then check if it's entirely white (which it should be).
//Load the image from stack overflow.
guard let url = URL(string: "https://i.stack.imgur.com/ymCX5.jpg"),
let data = try? Data(contentsOf: url),
let image = UIImage(data: data),
let cgImage = image.cgImage
else {
print("Unable to load image")
return
}
//Get the pixel data.
guard let pixelData = cgImage.dataProvider?.data,
let data = CFDataGetBytePtr(pixelData)
else {
print("Unable to get data from cgimage")
return
}
//Check there is on byte per pixel. If there's not, then the for loop
//below won't work as expected.
let byteCount = CFDataGetLength(pixelData)
let width = cgImage.width
let height = cgImage.height
print("Byte count: byteCount")
guard byteCount == (width * height) else {
print("More than one byte per pixel")
return
}
//Loop over each pixel and find any value that isn't white (255).
for byteOffset in 0..<byteCount {
if data[byteOffset] != 255 {
print("Found a non-white pixel \(data[byteOffset]) at X: \((byteOffset % width) + 1), Y: \((byteOffset / width) + 1)")
}
}
And the output...
Found a non-white pixel 254 at X: 316, Y: 1
Found a non-white pixel 254 at X: 317, Y: 1
Found a non-white pixel 1 at X: 318, Y: 1
Found a non-white pixel 0 at X: 319, Y: 1
...etc
Has anyone seen this before? Does anyone know why this is happening?
I'm new to CoreImage / Metal, so my apologies in advance if my question is naive. I spent a week going over the CoreImage documentation and examples and I couldn't figure this one out.
Suppose I have a reduction filter such as CIAreaAverage which outputs a 1x1 image. Is it possible to convert that image into a color that I can pass as an argument of another CIFilter? I'm aware that I can do this by rendering the CIAreaAverage output into a CVPixelBuffer, but I'm trying to do this in one render pass.
Edit #1 (Clarification):
Let's say I want to correct the white balance by allowing the user to sample from an image a gray pixel:
let pixelImage = inputImage.applyingFilter("CICrop", arguments: [
"inputRectangle": CGRect(origin: pixelCoordinates, size: CGSize(width: 1, height: 1))
])
// We know now that the extent of pixelImage is 1x1.
// Do something to convert the image into a pixel to be passed as an argument in the filter below.
let pixelColor = ???
let outputImage = inputImage.applyingFilter("CIWhitePointAdjust", arguments: [
"inputColor": pixelColor
])
Is there a way to tell the CIContext to convert the 1x1 CIImage into CIColor?
If you want to use the result of CIAreaAverage in a custom CIFilter (i.e. you don't need it for a CIColor parameter), you can directly pass it as a CIImage to that filter and read the value via sampler in the kernel:
extern "C" float4 myKernel(sampler someOtherInput, sampler average) {
float4 avg = average.sample(float2(0.5, 0.5)); // average only contains one pixel, so always sample that
// ...
}
You can also call .clampedToExtent() on the average CIImage before you pass it to another filter/kernel. This will cause Core Image to treat the average image as if it were infinitely large, containing the same value everywhere. Then it doesn't matter at which coordinate you sample the value. This might be useful if you want to use the average value in a custom CIColorKernel.
Something you can do that doesn't involve Metal is use CoreImage itself. Let's say you want a 640x640 image of the output from CIAreaAverage that is called ciPixel:
let crop = CIFilter(name: "CICrop")
crop?.setValue(ciPixel, forKey: "inputImage")
crop?.setValue(CIVector(x: 0, y: 0, z: 640, w: 640), forKey: "inputRectangle")
ciOutput = crop?.outputImage
I made this example to show the problem. It takes 1 pixel from texture by hardcoded coordinate and use as result for each pixel in shader. I expect that all the image will be in the same color. When images are small it works perfectly, but when I work with big images it has strange result. For example, here image has size 7680x8580 and you can see 4 squares:
Here is my code
kernel vec4 colorKernel(sampler source)
{
vec4 key = sample(source, samplerTransform(source, vec2(100., 200.)));
return key;
}
Here is how I init Kernel:
override var outputImage: CIImage? {
return colorFillKernel.apply(
extent: CGRect(origin: CGPoint.zero, size: inputImage!.extent.size),
roiCallback:
{
(index, rect) in
return rect
},
arguments: [
inputImage])
}
Also, this code shows image properly, without changes and squares:
vec2 dc = destCoord();
return sample(source, samplerTransform(source, dc));
On a public documentation it says "Core Image automatically splits large images into smaller tiles for rendering, so your callback may be called multiple times." but I can't find ways how to handle this situations. I have kaleidoscopic effects and from any this tile I need to be able to get pixel from another tile as well...
I think the problem occurs due to a wrongly defined region of interest in combination with tiling.
In the roiCallback, Core Image is asking you which area of the input image (at index in case you have multiple inputs) you kernel needs to look at in order to produce the given region (rect) of the output image. The reason why this is a closure is due to tiling:
If the processed image is too large, Core Image is breaking it down into multiple tiles, renders those tiles separately, and stitches them together again afterward. And for each tile Core Image is asking you what part of the input image your kernel needs to read to produce the tile.
So for your input image, the roiCallback might be called something like four times (or even more) during rendering, for example with the following rectangles:
CGRect(x: 0, y: 0, width: 4096, height: 4096) // top left
CGRect(x: 4096, y: 0, width: 3584, height: 4096) // top right
CGRect(x: 0, y: 4096, width: 4096, height: 4484) // bottom left
CGRect(x: 4096, y: 4096, width: 3584, height: 4484) // bottom right
This is an optimization mechanism of Core Image. It wants to only read and process the pixels that are needed to produce a given region of the output. So it's best to adapt the ROI as best as possible to your use case.
Now the ROI depends on the kernel. There are basically four scenarios:
Your kernel has a 1:1 mapping between input pixel and output pixel. So in order to produce an output color value, it needs to read the pixel at the same position from the input image. In this case, you just return the input rect in your roiCallback. (Or even better, you use a CIColorKernel that is made for this use case.)
Your kernel performs some kind of convolution and not only requires the input pixel at the same coordinate as the output but also some region around it (for instance for a blur operation). Your roiCallback could look like this then:
let inset = self.radius // like radius of CIGaussianBlur
let roiCallback: CIKernelROICallback = { _, rect in
return rect.insetBy(dx: -inset, dy: -inset)
}
Your kernel always needs to read a specific region of the input, regardless of which part of the output is rendered. Then you can just return that specific region in the callback:
let roiCallback: CIKernelROICallback = { CGRect(x: 100, y: 200, width: 1, height: 1) }
The kernel always needs access to the whole input image. This is for example the case when you use some kind of lookup table to derive colors. In this case, you can just return the extent of the input and ignore the parameters:
let roiCallback: CIKernelROICallback = { inputImage.extent }
For your example, scenario 3 should be the right choice. For your kaleidoscopic effects, I assume that you need a certain region or source pixels around the destination coordinate in order to produce an output pixel. So it would be best if you'd calculate the size of that region and use a roiCallback like in scenario 2.
P.S.: Using the Core Image Kernel Language (CIKernel(source: "<code>")) is super duper deprecated now. You should consider writing your kernels in the Metal Shading Language instead. Check out this year's WWDC talk to learn more. 🙂
Say I have a N-channel MPSImage or texture array that is based on MTLTexture.
How do I crop a region from it, copying all the N channels, but changing "pixel size"?
I'll just address the crop case, since the resize case involves resampling and is marginally more complicated. Let me know if you really need that.
Let's assume your source MPSImage is a 12 feature channel (3 slice) image that is 128x128 pixels, that your destination image is an 8 feature channel image (2 slices) that is 64x64 pixels, and that you want to copy the bottom-right 64x64 region of the last two slices of the source into the destination.
There is no API that I'm aware of that allows you to copy from/to multiple slices of an array texture at once, so you'll need to issue multiple blit commands to cover all the slices:
let sourceRegion = MTLRegionMake3D(64, 64, 0, 64, 64, 1)
let destOrigin = MTLOrigin(x: 0, y: 0, z: 0)
let firstSlice = 1
let lastSlice = 2 // inclusive
let commandBuffer = commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer.makeBlitCommandEncoder()
for slice in firstSlice...lastSlice {
blitEncoder.copy(from: sourceImage.texture,
sourceSlice: slice,
sourceLevel: 0,
sourceOrigin: sourceRegion.origin,
sourceSize: sourceRegion.size,
to: destImage.texture,
destinationSlice: slice - firstSlice,
destinationLevel: 0,
destinationOrigin: destOrigin)
}
blitEncoder.endEncoding()
commandBuffer.commit()
I'm not sure why you want to crop, but keep in mind that the MPSCNN layers can work on a smaller portion of your MPSImage. Just set the offset and clipRect properties and the layer will only work on that region of the source image.
In fact, you could do your crops this way using an MPSCNNNeuronLinear. Not sure if that is any faster or slower than using a blit encoder but it's definitely simpler.
Edit: added a code example. This is typed from memory so it may have small errors, but this is the general idea:
// Declare this somewhere:
let linearNeuron = MPSCNNNeuronLinear(a: 1, b: 0)
Then when you run your neural network, add the following:
let yourImage: MPSImage = ...
let commandBuffer = ...
// This describes the size of the cropped image.
let imgDesc = MPSImageDescriptor(...)
// If you're going to use the cropped image in other layers
// then it's a good idea to make it a temporary image.
let tempImg = MPSTemporaryImage(commandBuffer: commandBuffer, imageDescriptor: imgDesc)
// Set the cropping offset:
linearNeuron.offset = MPSOffset(x: ..., y: ..., z: 0)
// The clip rect is the size of the output image.
linearNeuron.clipRect = MTLRegionMake(0, 0, imgDesc.width, imgDesc.height)
linearNeuron.encode(commandBuffer: commandBuffer, sourceImage: yourImage, destinationImage: tempImg)
// Here do your other layers, taking tempImg as input.
. . .
commandBuffer.commit()
I'm try to create application where I'm use a UIScrolView and I can draw different jpeg images on the appropriate point for each of this images. It is must looks like a video, for example:
I have a first jpeg image (background), and this image (for example image has coordinates: x = 0, y = 0, and dimentions: width = 1024, height = 768), I send to UIScrolView for showing. Then I get enother jpeg image with their own coordinatates and dimentions(for example x = 10, y = 10, width = 100, height = 100) and I need show this "smal" image over "background".
How I can di that?
Or another way of this issue. Example:
I have a BMP data of first image(bacground). How I can send this BMP data to the UIScrolView for showing?\
P.S. One more question: Which of presented above ways will run faster for processing and presentation of data?
WBR
Maxim Tartachnik