So I have a PORO service where I get one image at time and store this original image, after that I scheduled a sidekiq job where I convert this image to webp format with three different dimensions. But I noticed that sidekiq consumes ~200MB of memory at the beginning and when it starts processing 4MB image(.jpeg) it quickly grows to ~350MB. And if the user sends 8 consecutive requests with the total image size of ~18mb the job may take up to 800MB and this memory is not freed after completion. Therefore, any further requests only increase the memory of the job. I'm running docker on linux machine, also i'm using plain ActiveStorage and image_processing gem with libvips image processor. Anyone having the same problem with this or know how to decrease the memory?
Here is the code of job:
class Api::V1::Ads::Images::ResizeAndUploadJob < Api::V1::ApplicationJob
sidekiq_options queue: 'high'
def perform(blob_id)
#blob = ActiveStorage::Blob.find_by(id: blob_id)
return if #blob.nil?
#blob.filename = "#{image_filename}_x1200.webp"
#blob.variant(format: :webp, resize_to_limit: [nil, 1200]).process
#blob.filename = "#{image_filename}_x560.webp"
#blob.variant(format: :webp, resize_to_limit: [nil, 560]).process
#blob.filename = "#{image_filename}_x130.webp"
#blob.variant(format: :webp, resize_to_limit: [nil, 130]).process
end
private
def image_filename
#image_filename ||= #blob.filename.base.split('_ORIGINAL').first
end
end
The libwebp WebPDecode() function is one shot rather than incremental, meaning you have to load the whole of the compressed input file into memory, allocate enough ram for the whole of the decompressed pixel array, then decompress the entire thing in a single call into libwebp.
This means that large webp images will need a lot of memory to process. Although this memory is freed again after the resize is done, heap fragmentation means that it can take a while for overall memory use to stabilize, and the level it settles at might be higher than you'd expect.
Workarounds:
A malloc that tries to avoid fragmentation, like jemalloc, can help a lot.
Don't use webp for large images if you can help it (not always possible, of course).
The libvips operation cache can mean memory is kept for longer than you'd expect. You can try turning the cache size down with cache_set_max().
libwebp now has API to do incremental decoding and encoding, but no one's got around to adding support to libvips yet. There's an open issue on this:
https://github.com/libvips/libvips/issues/3077
Related
I'm trying to use Dask to process a dataset larger than memory, stored in chunks saved as NumPy files. I'm loading the data lazily:
array = da.concatenate([
da.from_delayed(
dask.delayed(np.load)(path),
shape=(size, window_len, vocab_size),
dtype=np.float32
)
for path, size in zip(shard_paths, shard_sizes)
])
Then I run some processing on the file using da.map_blocks:
da.map_blocks(fn, array, drop_axis=[-1]).compute()
When I run this, my process gets killed, presumably due to high memory usage (not only is the data larger than memory, but there is also a memory limit on each process).
I could easily limit the memory by processing the chunks sequentially, but that way I would not benefit from the parallelism provided by Dask.
How can I limit the memory used by Dask (e.g. by only loading a certain number of chunks at a time) while still parallelizing over as many chunks as possible?
It is possible to limit the memory used by the process on Unix using the resource module:
import resource
resource.setrlimit(resource.RLIMIT_AS, (max_memory, max_memory))
Dask seems to be able to reduce its memory usage once it reaches this limit.
However, the process can still crash on the delayed np.load, so this doesn't necessarily solve the problem.
I'm using Prometheus 2.9.2 for monitoring a large environment of nodes.
As part of testing the maximum scale of Prometheus in our environment, I simulated a large amount of metrics on our test environment.
My management server has 16GB ram and 100GB disk space.
During the scale testing, I've noticed that the Prometheus process consumes more and more memory until the process crashes.
I've noticed that the WAL directory is getting filled fast with a lot of data files while the memory usage of Prometheus rises.
The management server scrapes its nodes every 15 seconds and the storage parameters are all set to default.
I would like to know why this happens, and how/if it is possible to prevent the process from crashing.
Thank you!
The out of memory crash is usually a result of a excessively heavy query. This may be set in one of your rules. (this rule may even be running on a grafana page instead of prometheus itself)
If you have a very large number of metrics it is possible the rule is querying all of them. A quick fix is by exactly specifying which metrics to query on with specific labels instead of regex one.
This article explains why Prometheus may use big amounts of memory during data ingestion. If you need reducing memory usage for Prometheus, then the following actions can help:
Increasing scrape_interval in Prometheus configs.
Reducing the number of scrape targets and/or scraped metrics per target.
P.S. Take a look also at the project I work on - VictoriaMetrics. It can use lower amounts of memory compared to Prometheus. See this benchmark for details.
Because the combination of labels lies on your business, the combination and the blocks may be unlimited, there's no way to solve the memory problem for the current design of prometheus!!!! But i suggest you compact small blocks into big ones, that will reduce the quantity of blocks.
Huge memory consumption for TWO reasons:
prometheus tsdb has a memory block which is named: "head", because head stores all the series in latest hours, it will eat a lot of memory.
each block on disk also eats memory, because each block on disk has a index reader in memory, dismayingly, all labels, postings and symbols of a block are cached in index reader struct, the more blocks on disk, the more memory will be cupied.
in index/index.go, you will see:
type Reader struct {
b ByteSlice
// Close that releases the underlying resources of the byte slice.
c io.Closer
// Cached hashmaps of section offsets.
labels map[string]uint64
// LabelName to LabelValue to offset map.
postings map[string]map[string]uint64
// Cache of read symbols. Strings that are returned when reading from the
// block are always backed by true strings held in here rather than
// strings that are backed by byte slices from the mmap'd index file. This
// prevents memory faults when applications work with read symbols after
// the block has been unmapped. The older format has sparse indexes so a map
// must be used, but the new format is not so we can use a slice.
symbolsV1 map[uint32]string
symbolsV2 []string
symbolsTableSize uint64
dec *Decoder
version int
}
We used the prometheus version 2.19 and we had a significantly better memory performance. This Blog highlights how this release tackles memory problems. i will strongly recommend using it to improve your instance resource consumption.
I am running a pipeline on multiple images. The pipeline consist of reading the images from file system, doing so processing on each of them, then saving the images to file system. However the dask worker fails due to MemoryError.
Is there a way to assure the dask workers don't load too many images in memory? i.e. Wait until there is enough space on a worker before starting the processing pipeline on a new image.
I have one scheduler and 40 workers with 4 cores, 15GB ram and running Centos7. I am trying to process 125 images in a batch; each image is fairly large but small enough to fit on a worker; around 3GB require for the whole process.
I tried to process a smaller amount of images and it works great.
EDITED
from dask.distributed import Client, LocalCluster
# LocalCluster is used to show the config of the workers on the actual cluster
client = Client(LocalCluster(n_workers=2, resources={'process': 1}))
paths = ['list', 'of', 'paths']
# Read the file data from each path
data = client.map(read, path, resources={'process': 1)
# Apply foo to the data n times
for _ in range(n):
data = client.map(foo, x, resources={'process': 1)
# Save the processed data
data.map(save, x, resources={'process': 1)
# Retrieve results
client.gather(data)
I expected the images to be process as space was available on the workers but it seems like the images are all loaded simultaneously on the different workers.
EDIT:
My issues is that all task get assigned to workers and they don't have enough memory. I found how to limit the number of task a worker handle at a single moment [https://distributed.readthedocs.io/en/latest/resources.html#resources-are-applied-separately-to-each-worker-process](see here).
However, with that limit, when I execute my task they all finish the read step, then the process step and finally the save step. This is an issues since the image are spilled to disk.
Would there be a way to make every task finish before starting a new one?
e.g. on Worker-1: read(img1)->process(img1)->save(img1)->read(img2)->...
Dask does not generally know how much memory a task will need, it can only know the size of the outputs, and that, only once they are finished. This is because Dask simply executes a pthon function and then waits for it to complete; but all osrts of things can happen within a python function. You should generally expect as many tasks to begin as you have available worker cores - as you are finding.
If you want a smaller total memory load, then your solution should be simple: have a small enough number of workers, so that if all of them are using the maximum memory that you can expect, you still have some spare in the system to cope.
To EDIT: you may want to try running optimize on the graph before submission (although this should happen anyway, I think), as it sounds like your linear chains of tasks should be "fused". http://docs.dask.org/en/latest/optimize.html
Using a single matlab worker I easily can achieve maximal frames per seconds (fps) of with my camera (using matlab imaq toolbox). This simple code does it:
matlabpool(1)
start(vid)
pause(1); % give matlab time to initialize the camera
for j=1:frames
data = getsnapshot(vid);
end
However, once I try to do some image processing on the fly, the effective rate drops by 50%. Since I have 5 more workers in the matlabpool (and also a gpu), can I optimize this such that each frame grabbed will be processed by a different worker? for example:
for j=1:frames
data = getsnapshot(vid);
<do some analysis with worker mod((j),5)+2 i.e. worker 2 to 6 >
end
the issue is the 'data' is serially obtained from the camera, and the analysis takes about 2 rounds of the loop, so if a different worker (or core) would take care of that each time, the maximum fps can be obtain again...
The way I see it, the workflow here is serial by nature..
Best you can do is to vectorize/parallelize your image processing function (so you still grab images one-by-one, but you distribute the processing on multiple cores)
I think I got the solution:
for i=1:frames
for sf=1:6; % I got 6 cores
m(:,:,sf) = getsnapshot(vid);
end
spmd
result=f(m(:,:,labindex));
end
end
I manange to get better results with GPU parallelization though...
I have a PHP script which is used to resize images in a user's FTP folder for use on his website.
While slow to resize, the script has completed correctly with all images in the past. Recently however, the user uploaded an album of 21-Megapixel JPEG images and as I have found, the script is failing to convert the images but not giving out any PHP errors. When I consulted various logs, I've found multiple Apache processes being killed off with Out Of Memory errors.
The functional part of the PHP script is essentially a for loop that iterates through my images on the disk and calls a method that checks if a thumbnail exists and then performs the following:
$image = new Imagick();
$image->readImage($target);
$image->thumbnailImage(1000, 0);
$image->writeImage(realpath($basedir)."/".rescale."/".$filename);
$image->clear();
$image->destroy();
The server has 512MB of RAM, with usually at least 360MB+ free.
PHP has it's memory limit set currently at 96MB, but I have set it higher before without any effect on the issue.
By my estimates, a 21-Megapixel image should occupy in the region of 80MB+ when uncompressed, and so I am puzzled as to why the RAM is disappearing so rapidly unless the Image Magick objects are not being removed from memory.
Is there some way I can optimise my script to use less memory or garbage collect more efficiently?
Do I simply not have the RAM to cope with such large images?
Cheers
See this answer for a more detailed explanation.
imagick uses a shared library and it's memory usage is out of reach for PHP, so tuning PHP memory and garbage collection won't help.
Try adding this prior to creating the new Imagick() object:
// pixel cache max size
IMagick::setResourceLimit(imagick::RESOURCETYPE_MEMORY, 32);
// maximum amount of memory map to allocate for the pixel cache
IMagick::setResourceLimit(imagick::RESOURCETYPE_MAP, 32);
It will cause imagick to swap to disk (defaults to /tmp) when it needs more than 32 MB for juggling images. It will be slower, but it will not run out of RAM (unless /tmp is on ramdisk, in that case you need to change where imagick writes its temp files).
MattBianco is nearly correct, only change is that the memory limits are in bytes so would be 33554432 for 32MB:
// pixel cache max size
IMagick::setResourceLimit(imagick::RESOURCETYPE_MEMORY, 33554432);
// maximum amount of memory map to allocate for the pixel cache
IMagick::setResourceLimit(imagick::RESOURCETYPE_MAP, 33554432);
Call $image->setSize() before $image->readImage() to have libjpeg resize the image whilst loading to reduce memory usage.
(edit), example usage: Efficient JPEG Image Resizing in PHP