We need to append two very large TIF files vertically using ImageMagick, but we are very limited on memory and disk resources because we are attempting to do this on AWS Lambda.
We currently use the very simply approach here...
magick convert image1.tif image2.tif -append result.tif
This works, but because of the size of each image, the memory and disk consumption is too high and we run into resource issues.
ImageMagick has a "stream" command (https://www.imagemagick.org/script/stream.php) but I cannot find any examples of how we might use it for what we are attempting to do.
We have tried other approaches, such a the -limit option, but we still run into issues. I am trying to determine how this could be done using the "stream" command, if it is possible at all. I have seen "stream" suggested for this use case, but no examples.
Any help greatly appreciated!
I'm not at a machine to test, but I suspect you can achieve that using much less memory, and time, with vips.
I think you'd want this at the command line:
vips join input1.tif input2.tif result.tif vertical
Add a final parameter of --vips-leak to check total memory used.
The join operation is documented here:
http://libvips.github.io/libvips/API/current/libvips-conversion.html#vips-join
There are node, PHP, Python, Ruby etc. bindings as well.
I created two 10,000x10,000 pixels TIF files and did the same append operation with ImageMagick and vips:
ImageMagick: 11 seconds and 4.86GB memory used
vips: 4 seconds and 157 MB memory used
Related
Does it mean that it will take 100MB (Open via Disk)?
Or it mean that it will take 100MB (Open via Memory)?
That's the threshold at which libvips will flip from open-via-memory to open-via-disc.
For small images (100mb when decompressed in this case), libvips will decompress to memory then process from there. This is obviously not a good idea for large images, so for these libvips will decompress to a temporary disc file, then map that area of disc into virtual memory and use that as the pixel source.
tldr: set VIPS_DISC_THRESHOLD to a small number to prefer the use of disc, set it to a large number to prefer RAM.
There's a chapter in the libvips docs which goes into a lot more detail:
https://www.libvips.org/API/current/How-it-opens-files.md.html
To very quickly summarize:
libvips has at least four ways of opening images and tries hard to pick the best one for you automatically.
Sometimes it'll need a bit of help to hit the best path for your use case and you have three main ways of influencing this.
You can hint the access pattern you expect for this image with the access= parameter, you can set the threshold at which it'll flip between preferring memory and preferring disc, and you can say where you'd like disc temporaries to be held.
If anyone has used the iOS wrapper for the LZMA SDK available at https://github.com/mdejong/lzmaSDK and have been able to tweak it in order to see the progress of unarchiving, please help.
I am going to use this SDK in iOS to extract a 16MB file, which uncompresses to a 150MB file, and this takes around 40seconds to complete. It would be good to have some kind of callback for showing the progress of uncompression.
Help is greatly appreciated.
Thanks
So, I looked at this issue quite a bit recently, and honestly the best you are going to be able to do is look for all the files in a specific tmp dir where decompression is going on and then count them and compare to a known size N. The problem with attempting to do this in the library is that it spans multiple runtimes and the callback idea makes the code a mess. Also, a callback would not help that much because of the way 7z compression works. To decode, one needs to build up the decompression dictionary before specific files can be decompressed, and that process of building up the dictionary takes a long time before the first file can even be written. So, if you put a "percent done" counter in your app showing how much was done, it would show 0% done for a long time, then jump to 50% and then 90 or 100 %. Basically, it would not be that useful even if it was implemented.
You could try C++ port of the latest LZMA SDK(15.06) without described above limitations(C version). Memory allocations and IO read/write can be tuned in runtime, plus work with password encrypted archives, smoothed progress, Lzma & Lzma2 archive types etc.
GitHub: https://github.com/OlehKulykov/LzmaSDKObjC
I wonder if it is ok (apart from the fact it is done million times daily...) to feed ImageMagic's convert command with user-uploaded files?
Obviously, ImageMagic has a large attack front, as it is capable of loading tons of formats and thus using vast amounts of code processing input data.
Excessive CPU or memory consumption do no harm, as this can be kept under control by simple means (for example by ulimit). Abitrary file access or runing injected code with network access however must be prevented by convert. Can we expect this?
Are there procedures to follow for the ImageMagic authors that would provide a feasable amount of security against such exploits?
Basically I'm trying to keep the memory use on my Nginx server under a certain amount, both because I'm insane (according to my friends) & I want to save money. However I'm worried ImageMagick may push it over the edge.
I'm using -limit area 20MiB and I've also tried -limit memory 15MiB -limit map 15MiB but when checking the process (as it runs) through top -c (with Shift-M) and ps aux it shows it using, sometimes, considerably more memory than I've set in the limits. To give numbers it may be using 35MB or 40MB, instead of the 20MB/30MB I would expect. I wouldn't be bothered for 2MB or 3MB but that's quite a large offset.
I've been told the extra memory may be the ImageMagick's overhead as it loads the interpreter etc, but I'm not super familiar with Unix programs so haven't a clue in that department.
If anyone can explain why this is happening, that would be great. If it's a normal thing, great. I'll just adjust things to take into account the fact that it may use my limit plus a certain amount, but if it isn't and the -limit parameter doesn't limit memory to a certain amount, what exactly is the point in having that parameter in ImageMagick?
Again thanks for your help in advance, it's much appreciated, as always.
According the documentation ImageMagick is moving all memory operations to mmaped files, so it will start to swap if you have enough disk space, see the manual:
SNIP from manual -limit:
The value for File is in number of files. The Disk limit is in
Gigabutes and the values for the other resources are in Megabytes. By
default the limits are 768 files, 1024MB memory, 4096MB map, and
unlimited disk, but these are adjusted at startup time on platforms
that can provide information about available resources. When the limit
is reached, ImageMagick will fail in some fashion, or take
compensating actions if possible. For example, -limit memory 32 -limit
map 64 limits memory When the pixel cache reaches the memory limit it
uses memory mapping. When that limit is reached it goes to disk. If
disk has a hard limit, the program will fail.
The Limits only affect ImageMagick's pixel cache. The program code and anything the libraries / delegates may do to load or process the images are not influenced by these settings at all.
You don't specify what you're looking at in top, the proper column would obviously be RES or RSIZE. With such small limits as 20MiB, the program and library code will represent a significant fraction of resident set size.
To verify that you're using the right units for your environment variables, use identify -list resource . If the size of the memory pixel cache (MAGICK_MEMORY_LIMIT) is insufficient for an image, an mmap-ed file will be used (MAGICK_MAP_LIMIT) and if that limit is too low, a conventional disk file (MAGICK_DISK_LIMIT) is used instead. If all the limits are too low, ImageMagick will fail immediately with an error such as cache resources exhausted, Memory allocation failed or corrupt image.
I need to display thumbnails of images in a given directory. I use TFileStream to read the image file before loading the image into an image component. The bitmap is then resized to the thumbnail size, and assigned to a TImage component on a TScrollBox.
It seems to work ok, but slows down quite a lot with larger images.
Is there a faster way of loading (image) files from disk and resizing them?
Thanks, Pieter
Not really. What you can do is resize them in a background thread, and use a "place holder" image until the resizing is done. I would then save these resized images to some sort of cache file for later processing (windows does this, and calls the cache thumbs.db in the current directory).
You have several options on the thread architecture itself. A single thread that does all images, or a thread pool where a thread only knows how to process a single image. The AsyncCalls library is even another way and can keep things fairly simple.
I'll complement the answer by skamradt with an attempt to design this for being as fast as possible. For this you should
optimize I/O
use multiple threads to make use of multiple CPU cores, and to keep even a single CPU core working while you read (or write) files
The use of multiple threads implies that using VCL classes for the resizing isn't going to work, as the VCL isn't thread-safe, and all hacks around that don't scale well. efg's Computer Lab has links for image processing code.
It's important to not cause several concurrent I/O operations when using multiple threads. If you choose to write the thumbnail images back to files, then once you have started reading a file you should read it completely, and once you have started writing a file you should also write it completely. Interleaving both operations will kill your I/O, because you potentially cause a lot of seeking operations of the hard disc head.
For best results the reading (and writing) of files should also not happen in the main (GUI) thread of your application. That would suggest the following design:
Have one thread read files into TGraphic objects, and put these into a thread-safe list.
Have a thread pool wait on the list of files in original size, and have one thread process one TGraphic object, resize it into another TGraphic object, and add this to another thread-safe list.
Notify the GUI thread for each thumbnail image added to the list, so it can be displayed.
If thumbnails are to be written to file, do this in the reading thread as well (see above for an explanation).
Edit:
On re-reading your question I notice that you maybe only need to resize one image, in which case a single background thread is of course enough. I'll leave my answer in place anyway, maybe it will be of use to someone else some time. It's what I learned from one of my latest projects, where the final program could have needed a little more speed but was only using about 75% of the quad core machine at peak times. Decoupling I/O from processing would have made the difference.
I often use TJPEGImage with Scale:=jsEighth (in Delphi 7). This is really fast because the JPEG de-compression can skip a lot of the data to fill a bitmap of only an eighth of width and height.
Another option is to use the shell's method to extract a thumbnail, which is pretty fast as well
I'm in the vision business, and I simply upload the images to the GPU using OpenGL. (typically 20x 2048x2000x8bpp per second), a bmp per texture, and let the videocard scale (win32, Mike Lischke's opengl headers)
Upload of such an image costs 5-10ms depending on exact videocard (if not integrated and nvidia 7300 series or newer. Very recent integrated GPUs might be doable also). Scaling and displaying costs 300us. Which means customers can pan and zoom like crazy without touching the app. I draw an overlay (which used to be a tmetafile but is now an own format) on top of it.
My biggest picture is 4096x7000x8bpp which shows and scales in under 30ms. (GF 8600)
A limitation of this technology is max texture size. It can be resolved by fragmenting the picture into multiple textures, but I haven't bothered yet because I deliver the systems with the software.
(some typical sizes:
nv6x00 series: 2k*2k but uploading is just about break even compared to GDI
nv7x00 series: 4k*4k For me the baseline cards. GF7300's are like $20-40
nv8x00 series: 8k*8k
)
Note that this might not be for everybody. But if you are in the lucky situation to specify hardware limits, it might work. The main problem are laptops like Thinkpads, the GPUs of which are older than the avg laptop, which are in turn often a generation behind Desktops.
I chose OpenGL over DirectX because it is more static in time, and easier to find non-game related examples.
Try to look at the Graphics32 library : it's very good at drawing things and works great with Bitmaps. They are Thread - Safe with good example, and it's totally free.
Exploit windows capacity to create thumbnails. Remember that hidden Thumbs.db files in folders that contain images?
I have implemented something like this feature but in VB. My software is able to build thumbnails of 100 files (mixed size) in around 10 seconds.
I am not able to convert it to Delphi though.