imagemagick - Restrict elaboration to a region of the image - imagemagick

I need to process a large amount of scans of dot matrix printed documents in order to optimize them for reading with an ocr engine.
I used imagemagick to make sure that there are no white spaces between the points of the matrix, so the ocr engine works much better. The problem is performance, pdfs are scanned at 600dpi and processing takes too long. I would like to limit the processing only to the area affected by the zonal ocr, I tried with the "-region" operator but even if it works, the processing takes the same time.
This is the command used by the windows command line:
convert -density "601.6x600" -units pixelsperinch -monochrome files\1.pdf -region 2000x200+2500+2100 -negate -morphology Thinning "17x17+8+8: -,-,-,-,-,0,0,0,0,0,0,0,-,-,-,-,- -,-,-,-,0,0,0,0,0,0,0,0,0,-,-,-,- -,-,-,0,0,0,0,0,0,0,0,0,0,0,-,-,- -,-,0,0,0,0,-,-,-,-,-,0,0,0,0,-,- -,0,0,0,0,-,-,-,-,-,-,-,0,0,0,0,- 0,0,0,0,-,-,-,-,-,-,-,-,-,0,0,0,0 0,0,0,-,-,-,-,-,-,-,-,-,-,-,0,0,0 0,0,0,-,-,-,-,-,-,-,-,-,-,-,0,0,0 0,0,0,-,-,-,-,-,1,-,-,-,-,-,0,0,0 0,0,0,-,-,-,-,-,-,-,-,-,-,-,0,0,0 0,0,0,-,-,-,-,-,-,-,-,-,-,-,0,0,0 0,0,0,0,-,-,-,-,-,-,-,-,-,0,0,0,0 -,0,0,0,0,-,-,-,-,-,-,-,0,0,0,0,- -,-,0,0,0,0,-,-,-,-,-,0,0,0,0,-,- -,-,-,0,0,0,0,0,0,0,0,0,0,0,-,-,- -,-,-,-,0,0,0,0,0,0,0,0,0,-,-,-,- -,-,-,-,-,0,0,0,0,0,0,0,-,-,-,-,-" -morphology Thinning "13x13+6+6: -,-,0,0,0,0,0,0,0,0,0,-,- -,0,0,0,0,0,0,0,0,0,0,0,- 0,0,0,-,-,-,-,-,-,-,0,0,0 0,0,-,-,-,-,-,-,-,-,-,0,0 0,0,-,-,-,-,-,-,-,-,-,0,0 0,0,-,-,-,-,-,-,-,-,-,0,0 0,0,-,-,-,-,1,-,-,-,-,0,0 0,0,-,-,-,-,-,-,-,-,-,0,0 0,0,-,-,-,-,-,-,-,-,-,0,0 0,0,-,-,-,-,-,-,-,-,-,0,0 0,0,0,-,-,-,-,-,-,-,0,0,0 -,0,0,0,0,0,0,0,0,0,0,0,- -,-,0,0,0,0,0,0,0,0,0,-,-" -morphology Close Disk -negate -compress zip r.pdf
P.S. I wanted to post on the imagemagick forum, but I didn't find the link to subscribe ...

Issue is resolved here https://github.com/ImageMagick/ImageMagick/discussions/2841.
the -region param appears to act after the processing of entire file. Fixed using clone, crop, layer and flatten:
magick in.tiff ( +clone -crop 100x100+500+500 -morphology dilate disk ) -layers flatten x2.tiff

Related

Imagemagick - Getting the usable dimension of an image

I've a process which is creating a file with convert (ImageMagick) based on some parameter, and after that it checks the file and gives back the biggest dimension of it which has real pixels.
The commands look like this:
convert -size 5000x5000 xc:white -font {font} -pointsize {size} -fill black -draw "{some_occassional_additional_parameter}text 200,2500 \"{text}\"" {some_other_occassional_additional_parameter}{temporary_file}
convert {temporary_file}.png -trim +repage -format "%[w]x%[h]\n" info:
. It'll result something like: 526x425
This process runs half a million time per day, and it seems to be a huge bottleneck. I'm looking for a solution which can done this in memory, so not always creating a file and check it, but do it in memory somehow.
If can speed it up just like 50%, that'd be a huge achievement.
Thank You
Not at a computer to test, but change your first command to:
convert -size ... xc:... occasionalstuff -format "%#" info:
Note that you can, and probably should double quote the "%#" especially if you use expressions containing characters of significance to the shell, though it is not strictly necessary in this specific case.

Compose multiple regions of an image into a target

I'm trying to use ImageMagick to compose different pieces of a rendered PDF into a target. E.g., I want to have ImageMagick render the PDF at 300dpi, then create a 300x400 pixel output image, then take from the PDF the area 10x20+30+40 and place it in the target (300x400 pixel image) at 12,34. Then take another (and a third and fourth) chunk at different coordinates with different sizes and place them at different places.
I cannot seem to figure out how to do this in one go, and doing it in multiple runs always re-renders the PDF and takes awfully long. Is this even possible?
Here's an idea of how you can approach this. It uses the MPR or "Memory Program Register" that Fred suggested in the comments. It is basically a named chunk of memory that I write into at the start and which I recall later when I need it.
Here is a rather wonderful start image from the Prokudin-Gorskii collection:
The code resizes the image and saves a copy in the MPR. Then, takes a copy of the MPR, crops out a head, resizes it and composites the resized result onto the resized original at a different location and then repeats the process for another head.
magick Prokudin.png -resize 300x400\! -write MPR:orig \
\( MPR:orig -crop 50x50+180+84 -resize 140x140 \) -geometry +10+240 -compose src-over -composite \
\( MPR:orig -crop 40x40+154+184 \) -geometry +40+100 -compose src-over -composite \
result.png
If you have trouble understanding it, try running it with the second or third line omitted so it just does one head ;-)
Hopefully it covers all the aspects of your question and you can adapt it to your PDF.

Resize an image with 54x54 squares (540x540) to 54x54pixel lossless

I've got an 540x540 image of 54x54 color squares (same sizes).
When I resize it to 54x54px it looks horrible (blurred), shouldn't a resize like this be perfectly done with imagemagick?
is I possible to get it perfect?
I've tested convert source.png -resize destination.png and -adaptive-resize but the result is the same..
I see what your confusion is now... the problem is not that the process is lossy, rather it is because the -resize is doing more sophisticated processing than you want in order to make an attractive job that you would want for, say, photographs. You want a very simple point sampling process which will produce simple blocks of pure, uncombined colour.
I'll make a start image:
magick -size 10x10 xc:red +noise random -scale 540x540 start.png
And scale it down, by taking a point sample in each block:
magick start.png -sample 10x10 small.png
And back up:
magick result.png -scale 540x540 reincarnated.png

Batch trim noisy images

I have a massive set of noisy images of drawings that people have created. I'd like to have some function to trim them down to ONLY the drawing.
Here are some examples:
Because of the noise -trim doesn't work
I also tried to use the example linked here (www.imagemagick.org/Usage/crop/#trim_blur), but it was ineffective because of differing noise levels both within and between images.
Lastly, I tried to increase the contrast to increase the likelihood of the lines of the actual drawing being identified, but for similar reasons to the above (differing noise levels), it only sharpened the lines in part of each image.
If anyone has any ideas, I'd love to hear them!
Not sure if this will work for all your images, as there are quite a few problems with them:
artefacts around the edges
uneven lighting, or shadows
noise
low-contrast
but you should get some ideas for addressing some of the issues.
To get rid of the artefacts around the edge, you could reduce the extent of the image by 2.5% on all sides - essentially a centred crop, like this:
convert noisy1.jpg -gravity center -extent 95x95% trimmed.png
To see the shadows/uneven lighting, I will normalise your image to a range of solid black to solid white and you will see the shadow at bottom left:
convert noisy1.jpg -normalize result.png
To remove this, I would clone your image and calculate the low frequency average over a larger area and then subtract that so that slowly changing things are removed:
convert noisy1.jpg \( +clone -statistic mean 25x25 \) -compose difference -composite -negate result.png
That gives this, and then you can try normalising it yourself to see that the shadow is gone:
If I now apply a Canny Edge Detection to that, I get this:
convert noisy1.jpg \( +clone -statistic mean 25x25 \) -compose difference -composite -normalize -negate -canny 0x1+10%+30% result.png
Here is a very crude, but hopefully effective, little script to do the whole lot. It doesn't do any checking of parameters. Save as $HOME/cropper.
#!/bin/bash
src=$1
dst=cropped-$1
tmp="tmp-$$.mpc"
trimbox=$(convert "$1" -extent 95x95% -write "$tmp" \( +clone -statistic mean 25x25 \) -compose difference -composite -normalize -negate -canny 0x1+10%+30% -format %# info:)
convert "$tmp" -crop $trimbox "$dst"
rm tmp-$$.*
Make the script executable with:
chmod +x $HOME/cropper
And run with a single image like this:
cd /path/to/some/images
$HOME/cropper OneImage.jpg
If you have hundreds of images, I would make a backup first, and then do them all in parallel with GNU Parallel
parallel $HOME/cropper {} ::: *.jpg

How to get the result of an ImageMagick convert command as bitmap data

I am working on a project that will make a jigsaw puzzle from an image and present it to the user as separate pieces in a browser. I have done all the prototyping in Python. At the moment I can produce separate images for each puzzle piece.
As a last step I want to make a nice bevel on the pieces to make them look realistic. I found a ImageMagick convert command to do that just fine:
convert piece.png -alpha extract -blur 0x2 -shade 120x30 piece.png -compose Overlay -composite piece.png -alpha on -compose Dst_In -composite result.png
I execute the command by using os.system, but this is taking way too long to complete.
Can you give me an advice on a solution to execute the ImageMagick processing in the fastest way? I think that would involve executing the processing directly with the ImageMagick libraries, sending it the input bitmap data and receiving the result also as bitmap data. Then I can stream the result to user. The solution does not have to be Python.
Update
I have just been looking at your command again - I kind of assumed it was sensible as you implied you got it from Anthony Thyssen's excellent ImageMagick Usage pages - however I see you are reading the image piece.png three times which it must be possible to avoid by using -clone or -write MPR:save. Let me experiment some more. I haven't got your jigsaw piece to test with, so I am in the dark here, but you must be able to change your command to something like this:
convert piece.png -write mpr:piece \
\( +clone -alpha extract -blur 0x2 -shade 120x30 \) \
-compose Overlay -composite \
mpr:piece -alpha on -compose Dst_In -composite result.png
MPR is a Memory Program Register, or basically a named lump of RAM that ImageMagick can read and write to. There are details and examples here.
Original Answer
Three things spring to mind... which one, or which combination of things, will help depends on the specification of your CPU, memory and disks as well as the sizes of your pieces - none of which I know or can test,
Firstly, if you used the libraries, you would avoid the overhead of creating a new process to run the convert - so that should help, but if your pieces are large and the bottleneck is actually the processing, using the libraries will make little difference.
Secondly, if your images are large, the time to read them in off disk and write them back to disk may be what is killing your performance. To test this, I would create a small RAMdisk and store the images on there and see if that helps. It is a quick and relatively easy test.
Thirdly, I assume you are generating many pieces and you currently do them one after the other in a sequential fashion. If this is the case, I would definitely recommend going multi-threaded. Either do this in your code with your language's threading environment, or try out GNU Parallel which has always been brilliant for me. So, if you were going to do
convert piece1.png -alpha extract ... -composite result1.png
convert piece2.png -alpha extract ... -composite result2.png
convert piece3.png -alpha extract ... -composite result3.png
...
convert piece1000.png -alpha extract ... -composite result1000.png
just either send all those commands to GNU Parallel on its stdin and it will execute them all in parallel on as many cores as your CPU has like this
(
echo convert piece1.png ... -composite result1.png
echo convert piece2.png ... -composite result2.png
echo convert piece3.png ... -composite result3.png
) | parallel
or build the command like this
parallel convert {} -alpha ..... result-{} ::: piece*.png

Resources