Image magick/PHP is falling over with large images - imagemagick

I have a PHP script which is used to resize images in a user's FTP folder for use on his website.
While slow to resize, the script has completed correctly with all images in the past. Recently however, the user uploaded an album of 21-Megapixel JPEG images and as I have found, the script is failing to convert the images but not giving out any PHP errors. When I consulted various logs, I've found multiple Apache processes being killed off with Out Of Memory errors.
The functional part of the PHP script is essentially a for loop that iterates through my images on the disk and calls a method that checks if a thumbnail exists and then performs the following:
$image = new Imagick();
$image->readImage($target);
$image->thumbnailImage(1000, 0);
$image->writeImage(realpath($basedir)."/".rescale."/".$filename);
$image->clear();
$image->destroy();
The server has 512MB of RAM, with usually at least 360MB+ free.
PHP has it's memory limit set currently at 96MB, but I have set it higher before without any effect on the issue.
By my estimates, a 21-Megapixel image should occupy in the region of 80MB+ when uncompressed, and so I am puzzled as to why the RAM is disappearing so rapidly unless the Image Magick objects are not being removed from memory.
Is there some way I can optimise my script to use less memory or garbage collect more efficiently?
Do I simply not have the RAM to cope with such large images?
Cheers

See this answer for a more detailed explanation.
imagick uses a shared library and it's memory usage is out of reach for PHP, so tuning PHP memory and garbage collection won't help.
Try adding this prior to creating the new Imagick() object:
// pixel cache max size
IMagick::setResourceLimit(imagick::RESOURCETYPE_MEMORY, 32);
// maximum amount of memory map to allocate for the pixel cache
IMagick::setResourceLimit(imagick::RESOURCETYPE_MAP, 32);
It will cause imagick to swap to disk (defaults to /tmp) when it needs more than 32 MB for juggling images. It will be slower, but it will not run out of RAM (unless /tmp is on ramdisk, in that case you need to change where imagick writes its temp files).

MattBianco is nearly correct, only change is that the memory limits are in bytes so would be 33554432 for 32MB:
// pixel cache max size
IMagick::setResourceLimit(imagick::RESOURCETYPE_MEMORY, 33554432);
// maximum amount of memory map to allocate for the pixel cache
IMagick::setResourceLimit(imagick::RESOURCETYPE_MAP, 33554432);

Call $image->setSize() before $image->readImage() to have libjpeg resize the image whilst loading to reduce memory usage.
(edit), example usage: Efficient JPEG Image Resizing in PHP

Related

Image processing in background job consumes a lot of memory

So I have a PORO service where I get one image at time and store this original image, after that I scheduled a sidekiq job where I convert this image to webp format with three different dimensions. But I noticed that sidekiq consumes ~200MB of memory at the beginning and when it starts processing 4MB image(.jpeg) it quickly grows to ~350MB. And if the user sends 8 consecutive requests with the total image size of ~18mb the job may take up to 800MB and this memory is not freed after completion. Therefore, any further requests only increase the memory of the job. I'm running docker on linux machine, also i'm using plain ActiveStorage and image_processing gem with libvips image processor. Anyone having the same problem with this or know how to decrease the memory?
Here is the code of job:
class Api::V1::Ads::Images::ResizeAndUploadJob < Api::V1::ApplicationJob
sidekiq_options queue: 'high'
def perform(blob_id)
#blob = ActiveStorage::Blob.find_by(id: blob_id)
return if #blob.nil?
#blob.filename = "#{image_filename}_x1200.webp"
#blob.variant(format: :webp, resize_to_limit: [nil, 1200]).process
#blob.filename = "#{image_filename}_x560.webp"
#blob.variant(format: :webp, resize_to_limit: [nil, 560]).process
#blob.filename = "#{image_filename}_x130.webp"
#blob.variant(format: :webp, resize_to_limit: [nil, 130]).process
end
private
def image_filename
#image_filename ||= #blob.filename.base.split('_ORIGINAL').first
end
end
The libwebp WebPDecode() function is one shot rather than incremental, meaning you have to load the whole of the compressed input file into memory, allocate enough ram for the whole of the decompressed pixel array, then decompress the entire thing in a single call into libwebp.
This means that large webp images will need a lot of memory to process. Although this memory is freed again after the resize is done, heap fragmentation means that it can take a while for overall memory use to stabilize, and the level it settles at might be higher than you'd expect.
Workarounds:
A malloc that tries to avoid fragmentation, like jemalloc, can help a lot.
Don't use webp for large images if you can help it (not always possible, of course).
The libvips operation cache can mean memory is kept for longer than you'd expect. You can try turning the cache size down with cache_set_max().
libwebp now has API to do incremental decoding and encoding, but no one's got around to adding support to libvips yet. There's an open issue on this:
https://github.com/libvips/libvips/issues/3077

Time required to access the memory locations in the same cache line

Consider the big box in the following figure as a cache and the block as a single cache line inside the cache.
The CPU fetched the data (first 4 elements of the array A) from RAM into the cache block.
Now, my question is, does it takes exactly same time to perform read/write operations on all the 4 memory locations (A[0], A[1], A[2] and A[3]) in the cache block or is it approximately same?
PS: I am expecting an answer for ideal case where runtime to perform any read/write operation on any memory location is not affected by the operating system jitter on user processes or applications.
With the line already hot in cache, time is constant for access to any aligned word in the cache. The hardware that handles the offset-within-line part of an address doesn't have to iterate through to the right position or anything, it just MUXes those bytes to the output.
If the line was not already hot in cache, then it depends on the design of the cache. If the CPU doesn't transfer around whole lines at once over a wide bus, then one / some words of the line will arrive before others. A cache that supports early-restart can let the load complete as soon as the needed word arrives.
A critical-word-first bus and memory allow that word to be the first one transferred for a demand-miss. Otherwise they arrive in some fixed order, and a cache miss on the last word of the line could take an extra few cycles.
Related:
Does cacheline size affect memory access latency?
if cache miss happens, the data will be moved to register directly or first moved to cache then to register?
which is optimal a bigger block cache size or a smaller one?

Can I make the glb file smaller?

I have a large obj file of 306 mb. So I converted it into a glb file to reduce its size. The size of the file has decreased a lot to 82 mb, but it is still big. I want to make this file smaller. Is there a way? If there is, please let me know.
If you can't reduce the glb file further, let me know more effective ways to reduce the obj file. One of the things I've already done is change the obj file to json, compress, unwind and load it using pako.js. I didn't choose this method because it was too slow to decompress.
There might be, if it is the vertex-data that is causing the file to be that big. In that case you can use the DRACO compression-library to get the size down even further.
First, to test the compressor, you can run
npx gltf-pipeline -i original.glb -d --draco.compressionLevel 10 -o compressed.glb
(you need to have a current version of node.js installed for this to work)
If vertex-data was the reason for the file being that big, the compressed file should be considerably smaller than the original.
Now you have to go through some extra-steps to load the file, as the regular GLTFLoader doesn't support DRACO-compressed meshes.
Essentially, you need to import the THREE.DRACOLoader and the draco-decoder. Finally, you need to tell your GLTFLoader that you know how to handle DRACO-compression:
DRACOLoader.setDecoderPath('path/to/draco-decoder');
gltfLoader.setDRACOLoader(new DRACOLoader());
After that, you can use the GLTFLoader as before.
The only downside of this is that the decoder itself needs some resources: decoding isn't free and the decoder itself is another 320kB of data to be loaded by the browser. I think it's still worth it if it saves you megabytes of mesh-data.
I'm surprised that no one has mentioned the obvious, simple way of lossily reducing the size of a .glb file that's just a container for separate mesh and texture data:
Reduce your vertex count by collapsing adjacent vertices that are close together or coplanar, and reduce your image data by trimming out, scaling down, or using a lower bit depth for unnecessary details.
Every 2X decrease in surface polygon/pixel density should yield roughly a 4X decrease in file size.
And then, once you've removed unneeded detail, start looking at things like DRACO, basis, fewer JPEG chroma samples, and optipng.

Why does Prometheus consume so much memory?

I'm using Prometheus 2.9.2 for monitoring a large environment of nodes.
As part of testing the maximum scale of Prometheus in our environment, I simulated a large amount of metrics on our test environment.
My management server has 16GB ram and 100GB disk space.
During the scale testing, I've noticed that the Prometheus process consumes more and more memory until the process crashes.
I've noticed that the WAL directory is getting filled fast with a lot of data files while the memory usage of Prometheus rises.
The management server scrapes its nodes every 15 seconds and the storage parameters are all set to default.
I would like to know why this happens, and how/if it is possible to prevent the process from crashing.
Thank you!
The out of memory crash is usually a result of a excessively heavy query. This may be set in one of your rules. (this rule may even be running on a grafana page instead of prometheus itself)
If you have a very large number of metrics it is possible the rule is querying all of them. A quick fix is by exactly specifying which metrics to query on with specific labels instead of regex one.
This article explains why Prometheus may use big amounts of memory during data ingestion. If you need reducing memory usage for Prometheus, then the following actions can help:
Increasing scrape_interval in Prometheus configs.
Reducing the number of scrape targets and/or scraped metrics per target.
P.S. Take a look also at the project I work on - VictoriaMetrics. It can use lower amounts of memory compared to Prometheus. See this benchmark for details.
Because the combination of labels lies on your business, the combination and the blocks may be unlimited, there's no way to solve the memory problem for the current design of prometheus!!!! But i suggest you compact small blocks into big ones, that will reduce the quantity of blocks.
Huge memory consumption for TWO reasons:
prometheus tsdb has a memory block which is named: "head", because head stores all the series in latest hours, it will eat a lot of memory.
each block on disk also eats memory, because each block on disk has a index reader in memory, dismayingly, all labels, postings and symbols of a block are cached in index reader struct, the more blocks on disk, the more memory will be cupied.
in index/index.go, you will see:
type Reader struct {
b ByteSlice
// Close that releases the underlying resources of the byte slice.
c io.Closer
// Cached hashmaps of section offsets.
labels map[string]uint64
// LabelName to LabelValue to offset map.
postings map[string]map[string]uint64
// Cache of read symbols. Strings that are returned when reading from the
// block are always backed by true strings held in here rather than
// strings that are backed by byte slices from the mmap'd index file. This
// prevents memory faults when applications work with read symbols after
// the block has been unmapped. The older format has sparse indexes so a map
// must be used, but the new format is not so we can use a slice.
symbolsV1 map[uint32]string
symbolsV2 []string
symbolsTableSize uint64
dec *Decoder
version int
}
We used the prometheus version 2.19 and we had a significantly better memory performance. This Blog highlights how this release tackles memory problems. i will strongly recommend using it to improve your instance resource consumption.

Maximum memory allocation on openCL CPU

I have read that there's a limit to the maximum memory allocation to around 60% of device memory, and these can be changed by modifying the GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_SIZE environment variables for GPU.
I am wonder if the AMD SDK has something similar for the CPU if I want to raise the limit of memory allocation?
For my current configuration, it returns the following:
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 2973.37MB
CL_DEVI_CEGLOBAL_MEM_SIZE = 11893.5MB
Thanks.
I was able to change this on my system. I don't know if this method was possible when you originally asked the question.
set the environment variable 'CPU_MAX_ALLOC_PERCENT' to the percentage of total memory you want to be able to allocate for a single global buffer. I have 8GB system memory, and after setting CPU_MAX_ALLOC_PERCENT to 80, clinfo reports the following:
Max memory allocation: 6871207116
Success! 6.399GB
You can also use GPU_MAX_ALLOC_PERCENT in the same way for your GPU devices.

Resources