I'm having an interesting issue with Leptonica that I'm wondering if other SO members have seen.
I'm doing a deskew operation, and having severe artifacting issues, so much so that nobody would rightly accept the results, which degrade the image quality more than they benefit it.
Here's the relevant code that produces the deskew operation:
// Make a black and white version for deskew calculations
l_int32 thresh;
PIX * deskewbw = pixMaskedThreshOnBackgroundNorm(pix,NULL,10,15,25,10,2,2,0.1,&thresh);
NSLog(#"Used threshold of %d to normalize image for deskew",thresh);
// Find the local skew
PTA * ptas, *ptad;
pixGetLocalSkewTransform(deskewbw, 0, 0, 0, 0.0, 0.0, 0.0, &ptas, &ptad);
// Cleanup the first B/W version
pixDestroy(&deskewbw);
// Deskew the original image
PIX * deskewgray = pixProjectivePtaGray(pix, ptad, ptas, 128);
// Reduce the deskewed original image to B/W
pixbw = pixMaskedThreshOnBackgroundNorm(deskewgray, NULL, 10, 15, 25, 10, 2, 2, 0.1, &thresh);
Whether I use this, or the pixDeskewLocal function (which does something similar) I get some VERY UGLY results with an interlaced line effect:
Just for comparison, here is the original (slightly skewed) image:
This happens whether the original is a black or white foreground, and is more severe in areas that are shifted more. I'm tempted at this point just to have iOS do the rendering for me to avoid Leptonica for this particular operation, but that increases the number of conversions in my workflow, which I'd rather avoid if possible.
Has anyone else encountered/overcome this issue before? Any pointers on why this happens/how to fix it?
You may use function pixEndianByteSwap(pixbw); to fix this problem.
I thought about this from an image processing perspective, and realized that I had probably made a mistake in reading data into Leptonica, rather than Leptonica being the culprit here, and as it turns out, I was right.
The pixel spacing for this glitch was 4, and as it turns out, Leptonica reads data in words, processing from MSB/MSb to LSB/LSb, leading to a conflict between the way that CGContext writes data, and how Leptonica reads it. This isn't as big of a problem if you read the data into Leptonica as rgba, because once you flatten it to greyscale or B/W, the error mostly vanishes (at the cost of some dynamic range) but since I was reading data in as 8-bit grayscale, the error didn't vanish, but instead manifested as you see above.
On Little-Endian systems, the data needs to be divided into words, and the byte order reversed to form a sensible image from a CGContext, on Big-Endian systems, no change is necessary. I'd prefer finding a method of having CGContext do this for me, but for now, I'll fix this the hard way.
Related
I have two questions:
First, is there any more direct, sane way to go from a texture atlas image to a texture array in WebGL than what I'm doing below? I've not tried this, but doing it entirely in WebGL seems possible, though four-times the work and I still have to make two round trips to the GPU to do it.
And am I right that because buffer data for texImage3D() must come from PIXEL_UNPACK_BUFFER, this data must come directly from the CPU side? I.e. There is no way to copy from one block of GPU memory to a PIXEL_UNPACK_BUFFER without copying it to the CPU first. I'm pretty sure the answer to this is a hard "no".
In case my questions themselves are stupid (and they may be), my ultimate goal here is simply to convert a texture atlas PNG to a texture array. From what I've tried, the fastest way to do this by far is via PIXEL_UNPACK_BUFFER, rather than extracting each sub-image and sending them in one at a time, which for large atlases is extremely slow.
This is basically how I'm currently getting my pixel data.
const imageToBinary = async (image: HTMLImageElement) => {
const canvas = document.createElement('canvas');
canvas.width = image.width;
canvas.height = image.height;
const context = canvas.getContext('2d');
context.drawImage(image, 0, 0);
const imageData = context.getImageData(0, 0, image.width, image.height);
return imageData.data;
};
So, I'm creating an HTMLImageElement object, which contains the uncompressed pixel data I want, but has no methods to get at it directly. Then I'm creating a 2D context version containing the same pixel data a second time. Then I'm repopulating the GPU with the same pixel data a third time. Seems bonkers to me, but I don't see a way around it.
I have one picture.
There are many broken places in the image.
Please refer to the the picture.
Who knows how to repair the broken stroke using opencv 3.0?
I used dilate operation in OpenCV and I got the picture as belows:
It looks so ugly if comparing the original image.
I am late to the party but I hope this helps someone.
Since you have not provided the original image I cannot say the following solution would work 100%. Not sure how you are thresholding the image but adaptive thresholding might give you better results. Opencv (Python) code:
gauss_win_size = 5
gauss_sigma = 3
th_window_size = 15
th_offset = 2
img_blur = cv2.GaussianBlur(image,(gauss_win_size,gauss_win_size),gauss_sigma)
th = cv2.adaptiveThreshold(img_blur,255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,th_window_size,th_offset)
Tinker around with the parameter values to see what values work best. It's usually a good idea to blur your image and that might possibly take care of broken binary images of alphabets. Note, blurring may eventually produce slightly thicker characters in the binary image. If this still leaves with a few broken characters then you can use morphological closing:
selem_shape = cv2.MORPH_RECT
selem_size = (3, 3)
selem = cv2.getStructuringElement(selem_shape, selem_size)
th = cv2.morphologyEx(image, cv2.MORPH_CLOSE, selem)
Again, tinker around with structuring element size and shape that works best with your images.
I am using the canvas draw functions drawrect and filltext to draw onto a Tbitmap but I don't want the results antialiased. Anyone know how to do that ?
Working with OSX and Delphi XE3 (but have XE4 and XE5 if needed)
Is the problem:
the bitmap you create seems to have anti-aliasing present in the data?
or have you got a good bitmap and want to disable anti-aliasing in the viewer/display?
If it is the former, have you checked that the anti-aliasing is actually present in the bitmap, and not introduced by your viewer?
In the past I've found it useful to draw a black-on-white test pattern, and display the image at 1:1 scale. Irfanview is a nice tool for viewing at 'true' scale. Then use a loupe/peak/lens to get a close-up of the actual pixels.
Black-on-white test patterns are particularly good since you should be able to see (hopefully) that the R,G and B sub-pixels are all equally illuminated when there is no anti-aliassing present. If you draw a black-on white pattern and you get solitary bright sub-pixels then you've definitely got anti-aliassing (or some other form of corruption!).
My experience has been that image viewers often do interpolation for you, and it can be tricky to see what is going on unless you look at the actual bitmap data or have a close-up look at the unscaled image...
Hi in the drawBitmap method you need to set HighSpeed parameter to "True", in the sample below:
NewBitmap.Canvas.DrawBitmap(SmallBmp, RectF(0, 0, SmallBmp.Width, SmallBmp.Height), RectF(0, 0, NewBitmap.Width, NewBitmap.Height), 1,**True**);
rgds
Ivan
I'm using the Emgu shape detection example application to detect rectangles on a given image. The dimensions of the resized image appear to impact the number of shapes detected even though the aspect ratio remains the same. Here's what I mean:
Using (400,400), actual img size == 342,400
Using (520,520), actual img size == 445,520
Why is this so? And how can the optimal value be determined?
Thanks
I replied to your post on EMGU but figured you haven't checked back but this is it. The shape detection works on the principle of thresh-holding unlikely matches, this prevents lots of false classifications. This is true for many image processing algorithms. Basically there are no perfect setting and a designer must select the most appropriate settings to produce the most desirable results. I.E. match the most objects without saying there's more than there actually is.
You will need to adjust each variable individually to see what kind of results you get. Start of with the edge detection.
Image<Gray, Byte> cannyEdges = gray.Canny(cannyThreshold, cannyThresholdLinking);
Have a look at your smaller image see what the difference is between the rectangles detected and the one that isn't. You could be missing and edge or a corner which is why it's not classified. If you are adjust cannyThreshold and observe the results, if good then keep it :) if bad :( go back to the original value. Once satisfied adjust cannyThresholdLinking and observe.
You will keep repeating this until you get a preferred image the advantage here is that you have 3 items to compare you will continue until the item that's not being recognised matches the other two.
If they are the similar, likely as it is a black and white image you'll need to go onto the Hough lines detection.
LineSegment2D[] lines = cannyEdges.HoughLinesBinary(
1, //Distance resolution in pixel-related units
Math.PI / 45.0, //Angle resolution measured in radians.
20, //threshold
30, //min Line width
10 //gap between lines
)[0]; //Get the lines from the first channel
Use the same method of adjusting one value at a time and observing the output you will hopefully find the settings you need. Never jump in with both feet and change all the values as you will never know if your improving the accuracy or not. Finally if all else fails look at the section that inspects the Hough results for a rectangle
if (angle < 80 || angle > 100)
{
isRectangle = false;
break;
}
Less variables to change as hough should do all the work for you. but still it could all work out here.
I'm sorry that there is no straight forward answer, but I hope you keep at it and solve the problem. Else you could always resize the image each time.
Cheers
Chris
My usual method of 100% contrast and some brightness adjusting to tweak the cutoff point usually works reasonably well to clean up photos of small sub-circuits or equations for posting on E&R.SE, however sometimes it's not quite that great, like with this image:
What other methods besides contrast (or instead of) can I use to give me a more consistent output?
I'm expecting a fairly general answer, but I'll probably implement it in a script (that I can just dump files into) using ImageMagick and/or PIL (Python) so if you have anything specific to them it would be welcome.
Ideally a better source image would be nice, but I occasionally use this on other folk's images to add some polish.
The first step is to equalize the illumination differences in the image while taking into account the white balance issues. The theory here is that the brightest part of the image within a limited area represents white. By blurring the image beforehand we eliminate the influence of noise in the image.
from PIL import Image
from PIL import ImageFilter
im = Image.open(r'c:\temp\temp.png')
white = im.filter(ImageFilter.BLUR).filter(ImageFilter.MaxFilter(15))
The next step is to create a grey-scale image from the RGB input. By scaling to the white point we correct for white balance issues. By taking the max of R,G,B we de-emphasize any color that isn't a pure grey such as the blue lines of the grid. The first line of code presented here is a dummy, to create an image of the correct size and format.
grey = im.convert('L')
width,height = im.size
impix = im.load()
whitepix = white.load()
greypix = grey.load()
for y in range(height):
for x in range(width):
greypix[x,y] = min(255, max(255 * impix[x,y][0] / whitepix[x,y][0], 255 * impix[x,y][1] / whitepix[x,y][1], 255 * impix[x,y][2] / whitepix[x,y][2]))
The result of these operations is an image that has mostly consistent values and can be converted to black and white via a simple threshold.
Edit: It's nice to see a little competition. nikie has proposed a very similar approach, using subtraction instead of scaling to remove the variations in the white level. My method increases the contrast in the regions with poor lighting, and nikie's method does not - which method you prefer will depend on whether there is information in the poorly lighted areas which you wish to retain.
My attempt to recreate this approach resulted in this:
for y in range(height):
for x in range(width):
greypix[x,y] = min(255, max(255 + impix[x,y][0] - whitepix[x,y][0], 255 + impix[x,y][1] - whitepix[x,y][1], 255 + impix[x,y][2] - whitepix[x,y][2]))
I'm working on a combination of techniques to deliver an even better result, but it's not quite ready yet.
One common way to remove the different background illumination is to calculate a "white image" from the image, by opening the image.
In this sample Octave code, I've used the blue channel of the image, because the lines in the background are least prominent in this channel (EDITED: using a circular structuring element produces less visual artifacts than a simple box):
src = imread('lines.png');
blue = src(:,:,3);
mask = fspecial("disk",10);
opened = imerode(imdilate(blue,mask),mask);
Result:
Then subtract this from the source image:
background_subtracted = opened-blue;
(contrast enhanced version)
Finally, I'd just binarize the image with a fixed threshold:
binary = background_subtracted < 35;
How about detecting edges? That should pick up the line drawings.
Here's the result of Sobel edge detection on your image:
If you then threshold the image (using either an empirically determined threshold or the Ohtsu method), you can clean up the image using morphological operations (e.g. dilation and erosion). That will help you get rid of broken/double lines.
As Lambert pointed out, you can pre-process the image using the blue channel to get rid of the grid lines if you don't want them in your result.
You will also get better results if you light the page evenly before you image it (or just use a scanner) cause then you don't have to worry about global vs. local thresholding as much.