logarithmic image resizing using Imagemagick? - image-processing

I'd like to rescale an image in one dimension, using a logarithmic scale. I've been searching for some idea on how to do this. Some options:
-evaluate seems to take Pow and other functions
-fx seems to be an option, and/but I'm still working to understand it
Both of the above might not be able to be passed to the -size or resizing operations.

I am not sure what you are asking. But here is how to apply a logarithmic transform to an image in ImageMagick. Since geometric transformations are inverse transformations (specify an output pixel and find where it comes from in the input image), we must use the inverse of the logarithm, which is the exponential. For example for a horizontal transformation:
Input:
convert lena.png -virtual-pixel black -fx "u.p{exp(6*i/(w-1)),j}" lena_ln.png
Now, if you want it to fit exactly into 256 pixels horizontally without the the black region on the right, we need to evaluate a logarithm:
ln(x+1) for x=0 to 255
ln(0+1) = ln(1) = 0
ln(255+1) = ln(256) = 5.545
Now we need the inverse operation for -fx, which is an exponential,
ln(x+1) = 5.545
x+1 = exp(5.545)
x = exp(5.545) - 1
convert lena.png -virtual-pixel black -fx "u.p{exp(5.545*i/(w-1))-1,j}" lena_ln2.png

Related

ImageMagick: Divide AE distortion by total pixels in fx output info format

I am trying to use ImageMagick 7 to detect if a specific channel in an image is largely pure black and pure white (plus a little antialiasing, and there's a chance the image could be pure black). This is to distinguish from another kind of image that shares a naming convention but has photographic-like image data in the r/g/b channels.
(Basically both image types are specular maps from different engines. The one I'm trying to differentiate here is more modern and has the metallic map in the blue channel; the other is much older and just has the specular colour in the RGB channels and the gloss map in the alpha.)
Currently I'm comparing the channel to a clone of itself that has had a 50% threshold applied, using the AE metric to see if it's largely the same apart from a small amount of antialiasing, and a fuzz of 1% to account for occasional aberration from pure black/white. This command works, but of course at the moment it only returns the number of distorted pixels:
magick ( "file.png" -channel b -separate ) ^
( +clone -channel b -separate -threshold 50% ) ^
-fuzz 1% -metric AE -compare ^
-format "%[distortion]" info:
Because the input image sizes will vary, I want to divide the distortion by the total number of pixels in the image to get the relative amount of the image that's not pure black/white -- under 10% has seemed good so far in my manual testing -- but I can't get the format syntax right. Everything I've tried -- for example "%[fx:%[distortion]/w*h]" -- has given the magick: undefined variable `[distortion]' # error/fx.c/FxGetSymbol/1169 error.
What syntax should I use? (And if there's a better way to do what I'm doing, I always appreciate it!)
I believe the following is what you want in Imagemagick. Basically you save the distortion in -set option: argument and then use it in -fx later.
However, +clone gives you just the b channel, so there should be no need for -channel b -separate in your second line.
magick ( "file.png" -channel b -separate ) ^
( +clone -threshold 50% ) ^
-fuzz 1% -metric AE -compare ^
-set option:distort "%[distortion]" ^
-format "%[fx:distort/(w*h)]" info:
Fred (#fmw42) has already provided an excellent method. There is another method for differentiating pure black and white images from greyscale images with a fuller tonal scale which may interest you. Credit to Anthony Thyssen for the technique described here.
If you use -solarize 50% in ImageMagick it inverts all the highlights, so it effectively folds your histogram in half and all the whites become pure black and all the near-whites become near blacks. The command looks like this:
magick INPUT -solarize 50% OUTPUT
So, if I apply that to a couple of input images - the first one pure black and near white, the second a greyscale - and show the corresponding output image on the right you'll see the effect:
If you now inspect the mean and standard deviation of the two solarised images:
magick {a,b}-sol.jpg -format "%f, mean: %[mean], stdev: %[standard-deviation]\n" info:
a-sol.jpg, mean: 2328.91, stdev: 3175.67
b-sol.jpg, mean: 16319.5, stdev: 9496.04
you can see that the mean and standard deviation of the first (pure black and white) image is low because all the bright whites have folded to near blacks, whereas the mean and standard deviation of the greyscale image are both higher because the tones are more spread out.

How to extract the pixels of a specific color for OCR?

I want to run some small images/sprites through OCR (Tesseract, probably) and extract a number or words out of it, and I know these number/words will be of a specific color (let's say white on a noisy/colored background).
While reading about pre-processing images for OCR, I thought it would be really beneficial to just remove everything that's not white from the image.
I'm using both imagemagick and vips but I have no idea where to start, what operations to use and how to search for it.
If we make a sample image like this:
magick -size 300x100 xc: +noise random -gravity center -fill white -pointsize 48 -annotate 0 "Hello" captcha.png
You can then fill with black anything that is not white:
magick captcha.png -fill black +opaque white result.png
If you want to accept colours close to white as being white, you can include some "fuzz":
magick captcha.png -fuzz 10% -fill black +opaque white result.png
There was a discussion on the libvips tracker a few months ago about techniques for background removal:
https://github.com/libvips/libvips/issues/1567
Here's the filter:
#!/usr/bin/python3
import sys
import pyvips
image = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
# aim for 250 for paper with low freq. removal
# ink seems to be slightly blueish
paper = 250
ink = [150, 160, 170]
# remove low frequencies .. don't need huge accuracy
low_freq = image.gaussblur(20, precision="integer")
image = image - low_freq + paper
# pull the ink down
ink_target = 30
scale = [(paper - ink_target) / (paper - i) for i in ink]
offset = [ink_target - i * s for i, s in zip(ink, scale)]
image = image * scale + offset
# find distance to white of each pixel ... small distances go to white
white = [100, 0, 0]
image = image.colourspace("lab")
d = image.dE76(white)
image = (d < 12).ifthenelse(white, image)
# boost saturation (scale ab)
image = image * [1, 2, 2]
image.write_to_file(sys.argv[2])
It removes low frequences (ie. paper folds etc.), stretches the contrast range, finds pixels close to white in CIELAB and moves them to white, and boosts saturation.
You'd probably need to tune it a bit for your use-case. Post some sample images if you need more advice.
I'm no expert in this area, but maybe try changing all pixels with RGB values below a certain threshold to black, or delete them?
As I mentioned before, I'm not very knowledgeable in any of this, but I don't see why this wouldn't work.
If the images are synthetic and uncompressed, you can test for strict equality of the RGB values. Otherwise, use a threshold on the distance between the RGB triples (Euclidean or Manhattan for instance).
If you want to allow variations in the lightness but not in the color, you can convert to HLS and compare HS.

How to treshold image from greyscale screen by webcome

I have image like this from my windstation
I have tried get thoose lines recognized, but lost becuase all filters not recognize lines.
Any ideas what i have use to get it black&white with at least some needed lines?
Typical detection result is something like this:
I need detect edges of digit, which seams not recognized with almost any settings.
This doesn't provide you with a complete guide as to how to solve your image processing question with opencv but it contains some hints and observations that may help you get there. My weapon of choice is ImageMagick, which is installed on most Linux distros and is available for OS X and Windows.
Firstly, I note you have date and time across the top and you haven't cropped correctly at the lower right hand side - these extraneous pixels will affect contrast stretches, so I crop them off.
Secondly, I separate your image in 3 channels - R, G and B and look at them all. The R and B channels are very noisy, so I would probably go with the Green channel. Alternatively, the Lightness channel is pretty reasonable if you go to HSL mode and discard the Hue and Saturation.
convert display.jpg -separate channel.jpg
Red
Green
Blue
Now make a histogram to look at the tonal distribution:
convert display.jpg -crop 500x300+0+80 -colorspace hsl -separate -delete 0,1 -format %c histogram:png:ahistogram.png
Now I can see all your data are down the dark, left-hand end of the histogram, so I do a contrast stretch and a median filter to remove the noise
convert display.jpg -crop 500x300+0+80 -colorspace hsl -separate -delete 0,1 -median 9x9 -normalize -level 0%,40% z.jpg
And a final threshold to get black and white...
convert display.jpg -crop 500x300+0+80 -colorspace hsl -separate -delete 0,1 -median 9x9 -normalize -level 0%,40% -threshold 60% z.jpg
Of course, you can diddle around with the numbers and levels, but there may be a couple of ideas in there that you can develop... in OpenCV or ImageMagick.

PIL - Image processing - how to achieve a clean image with no noisy background?binarization step exagerated?

Good afternoon,
I am writing an ocr program to detect text on images. So far I am getting good results but when text is black and background is white. What can I do to improve images that have white text on light colored background (yellow, green, etc)?
One original example image could be:
So far I am just converting it to grey_scale using:
image = image.convert('L')
Then apply a series of filters like for example:
SHARPEN
SMOOTH
BLUR
etc
Then i do binarization like this:
image = image.point(lambda x: 0 if x<128 else 255, '1') #refers to http://stackoverflow.com/questions/18777873/convert-rgb-to-black-or-white and also to http://stackoverflow.com/questions/29923827/extract-cow-number-from-image
My outoup images are indeed very bad for ocr feeding like this one:
What am I doing wrong? What should be the best approach for white text on light colored background?
Another doubt: is my binarization step to strong/exagerated?
Should I mix some filters? Could you suggest some?
PS: I am a total newbie to image processing, so please keep it simple =x
Thanks so much for your attention and help/advices.
I tried this with ImageMagick, which has Python bindings too - except I did it at the command line. I guess you can adapt what I did quite readily - I don't speak Pythonese nor use PIL but hopefully it will give you some insight as to a possible avenue.
convert http://i.stack.imgur.com/2cFk3.jpg -fuzz 50% -fill black +opaque white -threshold 50% x.png
Basically it takes any colour that is not within 50% of white and fills it with black, then it thresholds the result to pure black and white.
Another option would be to threshold the image according to the saturation of the colours. So, you convert to HSB colorspace, separate the channels and discard the hue and brightness. You are then left with the saturation which you threshold as follows:
convert http://i.stack.imgur.com/2cFk3.jpg -colorspace hsb -separate -delete 0,2 -threshold 50% x.png
Throw in a -negate to get white letters on black.
I have copied some other code for PIL, and am modifying it kind of/sort of to something that may be close to what you need - bear in mind I know no Python:
import colorsys
from PIL import Image
im = Image.open(filename)
ld = im.load()
width, height = im.size
for y in range(height):
for x in range(width):
r,g,b = ld[x,y]
h,s,v = colorsys.rgb_to_hsv(r/255., g/255., b/255.)
if s>0.5: // <--- here onwards is my attempted Python
ld[x,y] = (0,0,0)
else:
ld[x,y] = (255,255,255)

Understanding Perspective Projection Distortion ImageMagick

For a project I am trying to create a perspective distortion of an image to match a DVD case front template. So I want to automate this using ImageMagick (CLI) but I have a hard time understanding the mathematical aspects of this transformation.
convert \
-verbose mw2.png \
-alpha set \
-virtual-pixel transparent \
-distort Perspective-Projection '0,0 0,0 0,0 0,0' \
box.png
This code is en empty set of coordinates, I have read the documentation thoroughly but I can't seem to understand what parameter represents what point. The documentation gives me variables and names where I have no clue what they actually mean (more useful for a mathematical mastermind maybe). So if someone could explain me (visually prefered, or give me a link to useful information) on this subject because I have no clue on what I am doing. Just playing around with the parameters just wont do for this job and I need to calculate these points.
Here you will find an easy image of what I am trying to achieve (with CLI tools):
Update:
convert \
-virtual-pixel transparent \
-size 159x92 \
-verbose \
cd_empty.png \
\(mw2.png -distort Perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30'\) \
-geometry +3+20 \
-composite cover-after.png
Gives me as output:
cd_empty.png PNG 92x159 92x159+0+0 8-bit sRGB 16.1KB 0.000u 0:00.000
convert: unable to open image `(mw2.png': No such file or directory # error/blob.c/OpenBlob/2641.
convert: unable to open file `(mw2.png' # error/png.c/ReadPNGImage/3741.
convert: invalid argument for option Perspective : 'require at least 4 CPs' # error/distort.c/GenerateCoefficients/807.
convert: no images defined `cover-after.png' # error/convert.c/ConvertImageCommand/3044.
Correction by Kurt Pfeifle:
The command has a syntax error, because it does not surround the \( and \) delimiters by (at least one) blank on each side as required by ImageMagick!
Since there are no links to the source images provided, I cannot test the outcome of this corrected command:
convert \
-virtual-pixel transparent \
-size 159x92 \
-verbose \
cd_empty.png \
\( \
mw2.png -distort Perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30' \
\) \
-geometry +3+20 \
-composite \
cover-after.png
Did you see this very detailed explanation of ImageMagick's distortion algorithms? It comes with quite a few illustrations as well.
From looking at your example image, my guess is that you'll get there using a Four Point Distortion Method.
Of course, the example you gave with the 0,0 0,0 0,0 0,0 parameter does not do what you want.
Many of the distortion methods available in ImageMagick work like this:
The method uses a set of pairs of control points.
The values are numbers (may be floating point, not only integer).
Each pair of control points represents a pixel coordinate.
Each set of four values represent a source image coordinate, followed immediately by the destination image coordinate.
Transfer the coordinates for each source image control point into the respective destination image control point exactly as given by the respective parameters.
Transfer all the other pixel's coordinates according to the distortion method given.
Example:
Sx1,Sy1 Dx1,Dy1
Sx2,Sy2 Dx2,Dy2
Sx3,Sy3 Dx3,Dy3
...
Sxn,Syn Dxn,Dyn
x is used to represent an X coordinate.
y is used to represent an Y coordinate.
1, 2, 3, ... n is used to represent the 1st, 2nd, 3rd, ... nth pixel.
S is used here for the source pixel.
D is used here for the destination pixel.
First: method -distort perspective
The distortion method perspective will make sure that straight lines in the source image will remain straight lines in the destination image. Other methods, like barrel or bilinearforward do not: they will distort straight lines into curves.
The -distort perspective requires a set of at least 4 pre-calculated pairs of pixel coordinates (where the last one may be zero). More than 4 pairs of pixel coordinates provide for more accurate distortions. So if you used for example:
-distort perspective '1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16'
(for readability reasons using more {optional} blanks between the mapping pairs than required) would mean:
From the source image take pixel at coordinate (1,2) and paint it at coordinate (3,4) in the destination image.
From the source image take pixel at coordinate (5,6) and paint it at coordinate (7,8) in the destination image.
From the source image take pixel at coordinate (9,10) and paint it at coordinate (11,12) in the destination image.
From the source image take pixel at coordinate (13,14) and paint it at coordinate (15,16) in the destination image.
You may have seen photo images where the vertical lines (like the corners of building walls) do not look vertical at all (due to some tilting of the camera when taking the snap). The method -distort perspective can rectify this.
It can even achieve things like this, 'straightening' or 'rectifying' one face of a building that appears in the 'correct' perspective of the original photo:
==>
The control points used for this distortion are indicated by the corners of the red (source controls) and blue rectangles (destination controls) drawn over the original image:
==>
This particular distortion used
-distort perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30'
Complete command for your copy'n'paste pleasure:
convert \
-verbose \
http://i.stack.imgur.com/SN7sm.jpg \
-matte \
-virtual-pixel transparent \
-distort perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30' \
output.png
Second: method -distort perspective-projection
The method -distort perspective-projection is derived from the easier understandable perspective method. It achieves the exactly same distortion result as -distort perspective does, but doesn't use (at least) 4 pairs of coordinate values (at least 16 integers) as parameter, but 8 floating point coefficients.
It uses...
A set of exactly 8 pre-calculated coefficients;
Each of these coefficients is a floating point value (unlike with -distort perspective, where for values only integers are allowed);
These 8 values represent a matrix of the form
sx ry tx
rx sy ty
px py
which is used to calculate the destination pixels from the source pixels according to this formula:
X-of-destination = (sx*xs + ry+ys +tx) / (px*xs + py*ys +1)
Y-of-destination = (rx*xs + sy+ys +ty) / (px*xs + py*ys +1)
(TO BE DONE --
I've no time right now to find out how to
properly format + put formulas into the SO editor)
To avoid (the more difficult) calculating of the 8 required cooefficients for a re-usable -distort perspective-projection method, you can...
FIRST, (more easily) calculate the coordinates for a -distort perspective ,
SECOND, run this -distort perspective with a -verbose parameter added,
LAST, read the 8 coefficients from the output printed to stderr .
The (above quoted) complete command example would spit out this info:
Perspective Projection:
-distort PerspectiveProjection \
'1.945622, 0.071451, -12.187838, 0.799032,
1.276214, -24.470275, 0.006258, 0.000715'
Thanks to ImageMagick Distorting Images Documentation, I ended up with this clean-understandable code:
$points = array(
0,0, # Source Top Left
0,0, # Destination Top Left
0,490, # Source Bottom Left
2.2,512, # Destination Bottom Left
490,838, # Source Bottom Right
490,768, # Destination Bottom Right
838,0, # Source Top Right
838,50 # Destination Top Right
);
$imagick->distortImage(Imagick::DISTORTION_PERSPECTIVE, $points, false);
Please keep in mind that each set of coordinates are separated into two
parts. The first is the X axis and the second is the Y axis .. so when we say 838,0
at Destination Right Top, we mean the X axis of Destination Right Top
is 838 and the Y axis of it is zero (0).

Resources